Nvidia GPU Support

Mesos 1.0.0 added first-class support for Nvidia GPUs. The minimum required Nvidia driver version is 340.29.

Overview

Getting up and running with GPU support in Mesos is fairly straightforward once you know the steps necessary to make it work as expected. On one side, this includes setting the necessary agent flags to enumerate GPUs and advertise them to the Mesos master. On the other side, this includes setting the proper framework capabilities so that the Mesos master will actually include GPUs in the resource offers it sends to a framework. So long as all of these constraints are met, accepting offers that contain GPUs and launching tasks that consume them should be just as straightforward as launching a traditional task that only consumes CPUs, memory, and disk.

As such, Mesos exposes GPUs as a simple SCALAR resource in the same way it always has for CPUs, memory, and disk. That is, a resource offer such as the following is now possible:

cpus:8; mem:1024; disk:65536; gpus:4;

However, unlike CPUs, memory, and disk, only whole numbers of GPUs can be selected. If a fractional amount is selected, launching the task will result in a TASK_ERROR.

At the time of this writing, Nvidia GPU support is only available for tasks launched through the Mesos containerizer (i.e. no support exists for launching GPU capable tasks through the Docker containerizer). That said, the Mesos containerizer now supports running docker images natively, so this limitation should not affect the vast majority of users.

Moreover, we mimic the support provided by nvidia-docker to automatically mount the proper Nvidia drivers and tools directly into your docker container. This means you can easily test your GPU enabled docker containers locally and deploy them to Mesos with the assurance that they will work without modification.

In the following sections we walk through all of the flags and framework capabilities necessary to enable Nvidia GPU support in Mesos. We then show an example of setting up and running an example test cluster that launches tasks both with and without docker containers. Finally, we conclude with a step-by-step guide of how to install any necessary nvidia GPU drivers on your machine.

Agent Flags

The following isolation flags are required to enable Nvidia GPU support on an agent.

--isolation="cgroups/devices,gpu/nvidia"

The cgroups/devices flag tells the agent to restrict access to a specific set of devices for each task that it launches (i.e. a subset of all devices listed in /dev). When used in conjunction with the gpu/nvidia flag, the cgroups/devices flag allows us to grant / revoke access to specific GPUs on a per-task basis.

By default, all GPUs on an agent are automatically discovered and sent to the Mesos master as part of its resource offer. However, it may sometimes be necessary to restrict access to only a subset of the GPUs available an agent. This is useful, for example, if you want to exclude a specific GPU device because an unwanted Nvidia graphics card is listed alongside a more powerful set of GPUs. When this is required, the following additional agent flags can be used to accomplish this:

--nvidia_gpu_devices="<list_of_gpu_ids>"

--resources="gpus:<num_gpus>"

For the --nvidia_gpu_devices flag, you need to provide a comma separated list of GPUs, as determined by running nvidia-smi on the host where the agent is to be launched (see below for instructions on what external dependencies must be installed on these hosts to run this command). Example output from running nvidia-smi on a machine with four GPUs can be seen below:

+------------------------------------------------------+
| NVIDIA-SMI 352.79     Driver Version: 352.79         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 0000:04:00.0     Off |                    0 |
| N/A   34C    P0    39W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           Off  | 0000:05:00.0     Off |                    0 |
| N/A   35C    P0    39W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M60           Off  | 0000:83:00.0     Off |                    0 |
| N/A   38C    P0    40W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M60           Off  | 0000:84:00.0     Off |                    0 |
| N/A   34C    P0    39W / 150W |     34MiB /  7679MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+

The GPU id to choose can be seen in the far left of each row. Any subset of these ids can be listed in the --nvidia_gpu_devices flag (i.e., all of the following values of this flag are valid):

--nvidia_gpu_devices="0"
--nvidia_gpu_devices="0,1"
--nvidia_gpu_devices="0,1,2"
--nvidia_gpu_devices="0,1,2,3"
--nvidia_gpu_devices="0,2,3"
--nvidia_gpu_devices="3,1"
etc...

For the --resources=gpus:<num_gpus> flag, the value passed to <num_gpus> must equal the number of GPUs listed in --nvidia_gpu_devices. If these numbers do not match, launching the agent will fail. This can sometimes be a source of confusion, so it is important to emphasize it here for clarity.

Framework Capabilities

Once you launch an agent with the flags above, GPU resources will be advertised to the mesos master along side all of the traditional resources such as CPUs, memory, and disk. However, the master will only forward offers that contain GPUs to frameworks that have explicitly enabled the GPU_RESOURCES framework capability.

The choice to make frameworks explicitly opt-in to this GPU_RESOURCES capability was to keep legacy frameworks from accidentally consuming non-GPU resources on GPU-capable machines (and thus blocking your GPU jobs from running). It's not that big a deal if all of your nodes have GPUs, but in a mixed-node environment, it can be a big problem.

An example of setting this capability in a C++ based framework can be seen below:

FrameworkInfo framework;
framework.add_capabilities()->set_type(
      FrameworkInfo::Capability::GPU_RESOURCES);

GpuScheduler scheduler;

driver = new MesosSchedulerDriver(
    &scheduler,
    framework,
    127.0.0.1:5050);

driver->run();

Minimal GPU Capable Cluster

In this section we walk through two examples of launching GPU capable clusters and running tasks on them. The first example demonstrates the minimal setup required to run a command that consumes GPUs on a GPU capable agent. The second example demonstrates the setup necessary to launch a docker container that does the same.

Note: Both of these examples assume you have installed the external dependencies required for Nvidia GPU support on Mesos. Please see below for more information.

Minimal Setup Without Support for Docker Containers

The commands below show a minimal example of bringing up a GPU capable Mesos cluster on localhost and executing a task on it. The required agent flags are set as described above, and the mesos-execute command has been told to enable the GPU_RESOURCES framework capability so it can receive offers containing GPU resources.

$ mesos-master \
      --ip=127.0.0.1 \
      --work_dir=/var/lib/mesos

$ mesos-agent \
      --master=127.0.0.1:5050 \
      --work_dir=/var/lib/mesos \
      --isolation="cgroups/devices,gpu/nvidia"

$ mesos-execute \
      --master=127.0.0.1:5050 \
      --name=gpu-test \
      --command="nvidia-smi" \
      --framework_capabilities="GPU_RESOURCES" \
      --resources="gpus:1"

If all goes well, you should see something like the following in the stdout out of your task.

+------------------------------------------------------+
| NVIDIA-SMI 352.79     Driver Version: 352.79         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 0000:04:00.0     Off |                    0 |
| N/A   34C    P0    39W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Minimal Setup With Support for Docker Containers

The commands below show a minimal example of bringing up a GPU capable Mesos cluster on localhost and running a docker container on it. The required agent flags are set as described above, and the mesos-execute command has been told to enable the GPU_RESOURCES framework capability so it can receive offers containing GPU resources. Additionally, the required flags to enable support for docker containers (as described here) have been set up as well.

$ mesos-master \
      --ip=127.0.0.1 \
      --work_dir=/var/lib/mesos

$ mesos-agent \
      --master=127.0.0.1:5050 \
      --work_dir=/var/lib/mesos \
      --image_providers=docker \
      --executor_environment_variables="{}" \
      --isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia"

$ mesos-execute \
      --master=127.0.0.1:5050 \
      --name=gpu-test \
      --docker_image=nvidia/cuda \
      --command="nvidia-smi" \
      --framework_capabilities="GPU_RESOURCES" \
      --resources="gpus:1"

If all goes well, you should see something like the following in the stdout out of your task.

+------------------------------------------------------+
| NVIDIA-SMI 352.79     Driver Version: 352.79         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 0000:04:00.0     Off |                    0 |
| N/A   34C    P0    39W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

External Dependencies

Any host running a Mesos agent with Nvidia GPU support MUST have a valid Nvidia kernel driver installed. It is also highly recommended to install the corresponding user-level libraries and tools available as part of the Nvidia CUDA toolkit. Many jobs that use Nvidia GPUs rely on CUDA and not including it will severely limit the type of GPU-aware jobs you can run on Mesos.

Note: The minimum supported version of CUDA is 6.5.

Installing the Required Tools

The Nvidia kernel driver can be downloaded at the link below. Make sure to choose the proper model of GPU, operating system, and CUDA toolkit you plan to install on your host:

http://www.nvidia.com/Download/index.aspx

Unfortunately, most Linux distributions come preinstalled with an open source video driver called Nouveau. This driver conflicts with the Nvidia driver we are trying to install. The following guides may prove useful to help guide you through the process of uninstalling Nouveau before installing the Nvidia driver on CentOS or Ubuntu.

http://www.dedoimedo.com/computers/centos-7-nvidia.html
http://www.allaboutlinux.eu/remove-nouveau-and-install-nvidia-driver-in-ubuntu-15-04/

After installing the Nvidia kernel driver, you can follow the instructions in the link below to install the Nvidia CUDA toolkit:

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/

In addition to the steps listed in the link above, it is highly recommended to add CUDA's lib directory into your ldcache so that tasks launched by Mesos will know where these libraries exist and link with them properly.

sudo bash -c "cat > /etc/ld.so.conf.d/cuda-lib64.conf << EOF
/usr/local/cuda/lib64
EOF"

sudo ldconfig

If you choose not to add CUDAs lib directory to your ldcache, you MUST add it to every task's LD_LIBRARY_PATH that requires it.

Note: This is not the recommended method. You have been warned.

Verifying the Installation

Once the kernel driver has been installed, you can make sure everything is working by trying to run the bundled nvidia-smi tool.

nvidia-smi

You should see output similar to the following:

Thu Apr 14 11:58:17 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.79     Driver Version: 352.79         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 0000:04:00.0     Off |                    0 |
| N/A   34C    P0    39W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           Off  | 0000:05:00.0     Off |                    0 |
| N/A   35C    P0    39W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M60           Off  | 0000:83:00.0     Off |                    0 |
| N/A   38C    P0    38W / 150W |     34MiB /  7679MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M60           Off  | 0000:84:00.0     Off |                    0 |
| N/A   34C    P0    38W / 150W |     34MiB /  7679MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

To verify your CUDA installation, it is recommended to go through the instructions at the link below:

http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#install-samples

Finally, you should get a developer to run Mesos's Nvidia GPU related unit tests on your machine to ensure that everything passes (as described below).

Running Mesos Unit Tests

At the time of this writing, the following Nvidia GPU specific unit tests exist on Mesos:

DockerTest.ROOT_DOCKER_NVIDIA_GPU_DeviceAllow
DockerTest.ROOT_DOCKER_NVIDIA_GPU_InspectDevices
NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_VerifyDeviceAccess
NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage
NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FractionalResources
NvidiaGpuTest.NVIDIA_GPU_Discovery
NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FlagValidation
NvidiaGpuTest.NVIDIA_GPU_Allocator
NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeCreation
NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeShouldInject)

The capitalized words following the '.' specify test filters to apply when running the unit tests. In our case the filters that apply are ROOT, CGROUPS, and NVIDIA_GPU. This means that these tests must be run as root on Linux machines with cgroups support that have Nvidia GPUs installed on them. The check to verify that Nvidia GPUs exist is to look for the existence of the Nvidia System Management Interface (nvidia-smi) on the machine where the tests are being run. This binary should already be installed if the instructions above have been followed correctly.

So long as these filters are satisfied, you can run the following to execute these unit tests:

[mesos]$ GTEST_FILTER="" make -j check
[mesos]$ sudo bin/mesos-tests.sh --gtest_filter="*NVIDIA_GPU*"