How to run Docker in Ubuntu 16.04 with Volta GPU support

tutorial

#1

Intro

The brand new NVIDIA Volta GPU is an incredibly powerful chip built specifically for Deep Learning that can minimize training time by orders of magnitude. Since it is so new, the Volta requires CUDA 9 and the latest version of cuDNN which just launched a few weeks ago. Using Docker, we can quickly get an environment running with everything ready to go.

If you already have a Volta instance running, you can skip to step 2.

1. Create a machine

From the Paperspace console machines view, click new machine, select Ubuntu 16.04 from under the Linux templates tab and v100 tile from the list of machine types. Set other options as appropriate, including adding a public IP (you can easily add this later if you like). After the machine finishes provisioning, open the terminal view and log in using the password sent via email.

Set a new password using the passwd command.

2. Download CUDA

You can find the CUDA repo deb packages here:
CUDA download area

Select the following:

  1. Linux as target platform
  2. x86_64 as the arch
  3. Ubuntu as the Distribution
  4. 16.04 as the version
  5. deb (network) as the install type

This link will take you directly to the download with the above configuration:
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=debnetwork

Use wget to pull down the file directly into your VM:

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb

Or you can download locally and SCP the .deb file to your VM, e.g.:

scp ../Downloads/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb  [email protected]:

Add the package using dpkg, e.g.:

sudo dpkg --install cuda-repo-ubuntu1604_9.0.176-1_amd64.deb

NOTE: We suggest removing a stray file in apt.conf.d to avoid potential errors as a result of it’s existence, e.g.:

sudo rm /etc/apt/apt.conf.d/50unattended-upgrades.ucf-dist

Add the key per instructions, e.g.:

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

3. Install CUDA

Run apt update and install CUDA 9:

sudo apt update
sudo apt install cuda-9-0

Insert NVIDIA module with modprobe and check nvidia-smi, e.g.:

sudo modprobe nvidia
nvidia-smi

4. Install Docker

Fetch and install docker-ce, e.g.:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt update
sudo apt install docker-ce

5. Install nvidia-docker

Fetch and install nvidia-docker package, e.g.:

# Ubuntu distributions only
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb

6. Test your environment

Test nvidia docker image, e.g.:

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi

That’s it! The Volta is still super-early so some of the deep learning frameworks will require bleeding edge versions built from source. Mainstream support will be coming soon but before you move anything into production, be sure to thoroughly test your setup.


#2

Looks Good and Works Well: A few notes.

##On Step 2: Download CUDA

It would be better if it just gave them the command to download the deb package since we already have assumptions about which platform they’re using. I assume there is a way to check what version is currently supported since we have examples of that later on in the article. An example of this would be

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb

This would remove selecting the package from the Nvidia and having to scp the file to their machine. If there are any problems with versioning then we can also suggest this link. It should take them to the most recent version.


Then this section:

I don’t see any warnings as a result of this file until after the next command in the tutorial. The warning for this file comes up a couple times in the tutorial later on though. I would phrase the removal of this file more as an suggestion than a warning; especially if there is no harm in removing it.

we suggest removing a stray file in apt.conf.d to avoid potential errors as a result of it’s existence.

##On 4: Install Docker

We should put this command in it’s own code block. It’s a bit hard to tell whether the next line should be included in the command because of the “-” hyphen.


#3

Great feedback, updated the guide.