During my Bachelorthesis, one of the repetive tasks was Machine Learning/Training using openCV and evaluating the results.
A Task which screamed for CI.
After a few CI runs, I decided it was a good idea, to use a descent GPU instead of the container CPU.
Following many other posts, i ended up at docker / proxmox and neither had the right cgroup settings.
Nvidia-smi worked, but cuda tests did not.
So here is “my way”. Tested on Debian 8 9 and ubuntu 16.04
Download the Drivers
First of all Download the Cuda Toolkit from https://developer.nvidia.com/cuda-downloads.
Also Download the Nvidia Driver from http://www.nvidia.de/Download/index.aspx?lang=de.
It will be of vital importance that the container and the host will have the same versions running.
NVML Driver/library version mismatch might probably occur, e.g. if using the cuda installed driver 384.81 but having loaded the 384.90 module.
Installing the Drivers
On the Host:
Follow the Instructions
after optionally install the Cuda Development kit, for testing purposes e.g. device query.
Preparing the LXC container
If nvidia-smi runs on your host, it is a good sign
Now check if /dev/nvidia exists
Take a closer look at those numbers: 192,254
Next step is to create the lxc container
Configuring the LXC container
without uvm cgroup , nvidia-smi will work, but running cuda code will not.
Therefore, we allow the usage.
lxc.cgroup.devices.allow = c 195:* rwm
lxc.cgroup.devices.allow = c 240:* rwm
lxc.mount.entry = /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry = /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry = /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry = /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
Configuring Inside the LXC container
Ssh or attach yourself into the container.sudo ./cuda_9.0.176_384.81_linux.run
Install the Driver without the kernel module
sudo ./NVIDIA-Linux-x86_64-384.90.run --no-kernel-module
After finishing the installation, test using nvidia-smi
Next step is to install the cuda toolkit
Now make sure to modify the path and ld path as the installation states. I added these lines to my profile.d/cuda.sh
We will now test cuda compability using the device query example
The Result should look something like this
In case cudnn is neaded for your programm
Download the tar archive from https://developer.nvidia.com/cudnn.
Extract it and move it to the cuda folder
tar -xvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
- NVML Driver/library version
- Cuda Errro Unknown
/proc/drivers/nvidia/version, and make sure that this version is the same as nvidia-smi. This may come from also installing the driver from the cuda package but having loaded the kernel module from your driver. Install your driver again or load another kernel
It was not possible to use cuda in the container, check the cgroup numbers, maybe allow 254, and make sure that they are also mounted inside of the container