[lxc-devel] [lxd/master] Extend nvidia runtime options

stgraber on Github lxc-bot at linuxcontainers.org
Wed Sep 12 23:02:55 UTC 2018


A non-text attachment was scrubbed...
Name: not available
Type: text/x-mailbox
Size: 792 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20180912/51517711/attachment.bin>
-------------- next part --------------
From 2325ba266da4ffa95084f4e38d1765047ce9b58c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?St=C3=A9phane=20Graber?= <stgraber at ubuntu.com>
Date: Wed, 12 Sep 2018 19:01:16 -0400
Subject: [PATCH] Extend nvidia runtime options
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This introduces an additional 3 configuration keys to control the
libnvidia-container integration:

 - nvidia.driver.capabilities (maps to NVIDIA_DRIVER_CAPABILITIES)
 - nvidia.require.cuda (maps to NVIDIA_REQUIRE_CUDA)
 - nvidia.require.driver (maps to NVIDIA_REQUIRE_DRIVER)

Details on the valid values for those options can be found in the NVIDIA
documentation here:

  https://github.com/NVIDIA/nvidia-container-runtime

Signed-off-by: Stéphane Graber <stgraber at ubuntu.com>
---
 doc/api-extensions.md   |  8 ++++++++
 doc/containers.md       |  3 +++
 lxd/container_lxc.go    | 30 +++++++++++++++++++++++++++---
 scripts/bash/lxd-client |  1 +
 shared/container.go     |  5 ++++-
 shared/version/api.go   |  1 +
 6 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/doc/api-extensions.md b/doc/api-extensions.md
index 393085f1f2..4bdd5bde20 100644
--- a/doc/api-extensions.md
+++ b/doc/api-extensions.md
@@ -585,3 +585,11 @@ This introduces the config keys `candid.domains` and `candid.expiry`. The
 former allows specifying allowed/valid Candid domains, the latter makes the
 macaroon's expiry configurable. The `lxc remote add` command now has a
 `--domain` flag which allows specifying a Candid domain.
+
+## nvidia\_runtime\_config
+This introduces a few extra config keys when using nvidia.runtime and the libnvidia-container library.
+Those keys translate pretty much directly to the matching nvidia-container environment variables:
+
+ - nvidia.driver.capabilities => NVIDIA\_DRIVER\_CAPABILITIES
+ - nvidia.require.cuda => NVIDIA\_REQUIRE\_CUDA
+ - nvidia.require.driver => NVIDIA\_REQUIRE\_DRIVER
diff --git a/doc/containers.md b/doc/containers.md
index 24842ba6a4..e9038a93d6 100644
--- a/doc/containers.md
+++ b/doc/containers.md
@@ -57,7 +57,10 @@ linux.kernel\_modules                   | string    | -             | yes
 migration.incremental.memory            | boolean   | false         | yes           | migration\_pre\_copy                 | Incremental memory transfer of the container's memory to reduce downtime.
 migration.incremental.memory.goal       | integer   | 70            | yes           | migration\_pre\_copy                 | Percentage of memory to have in sync before stopping the container.
 migration.incremental.memory.iterations | integer   | 10            | yes           | migration\_pre\_copy                 | Maximum number of transfer operations to go through before stopping the container.
+nvidia.driver.capabilities              | string    | all           | no            | nvidia\_runtime\_config              | What driver capabilities the container needs (sets libnvidia-container NVIDIA\_DRIVER\_CAPABILITIES)
 nvidia.runtime                          | boolean   | false         | no            | nvidia\_runtime                      | Pass the host NVIDIA and CUDA runtime libraries into the container
+nvidia.require.cuda                     | string    | -             | no            | nvidia\_runtime\_config              | Version expression for the required CUDA version (sets libnvidia-container NVIDIA\_REQUIRE\_CUDA)
+nvidia.require.driver                   | string    | -             | no            | nvidia\_runtime\_config              | Version expression for the required driver version (sets libnvidia-container NVIDIA\_REQUIRE\_DRIVER)
 raw.apparmor                            | blob      | -             | yes           | -                                    | Apparmor profile entries to be appended to the generated profile
 raw.idmap                               | blob      | -             | no            | id\_map                              | Raw idmap configuration (e.g. "both 1000 1000")
 raw.lxc                                 | blob      | -             | no            | -                                    | Raw LXC configuration to be appended to the generated one
diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go
index de14f2a814..1fc9203d98 100644
--- a/lxd/container_lxc.go
+++ b/lxd/container_lxc.go
@@ -1229,9 +1229,33 @@ func (c *containerLXC) initLXC(config bool) error {
 			return err
 		}
 
-		err = lxcSetConfigItem(cc, "lxc.environment", "NVIDIA_DRIVER_CAPABILITIES=compute,utility")
-		if err != nil {
-			return err
+		nvidiaDriver := c.expandedConfig["nvidia.driver.capabilities"]
+		if nvidiaDriver == "" {
+			err = lxcSetConfigItem(cc, "lxc.environment", "NVIDIA_DRIVER_CAPABILITIES=all")
+			if err != nil {
+				return err
+			}
+		} else {
+			err = lxcSetConfigItem(cc, "lxc.environment", fmt.Sprintf("NVIDIA_DRIVER_CAPABILITIES=%s", nvidiaDriver))
+			if err != nil {
+				return err
+			}
+		}
+
+		nvidiaRequireCuda := c.expandedConfig["nvidia.require.cuda"]
+		if nvidiaRequireCuda == "" {
+			err = lxcSetConfigItem(cc, "lxc.environment", fmt.Sprintf("NVIDIA_REQUIRE_CUDA=%s", nvidiaRequireCuda))
+			if err != nil {
+				return err
+			}
+		}
+
+		nvidiaRequireDriver := c.expandedConfig["nvidia.require.driver"]
+		if nvidiaRequireDriver == "" {
+			err = lxcSetConfigItem(cc, "lxc.environment", fmt.Sprintf("NVIDIA_REQUIRE_DRIVER=%s", nvidiaRequireDriver))
+			if err != nil {
+				return err
+			}
 		}
 
 		err = lxcSetConfigItem(cc, "lxc.hook.mount", hookPath)
diff --git a/scripts/bash/lxd-client b/scripts/bash/lxd-client
index bb12d7d5ea..95caea3a2c 100644
--- a/scripts/bash/lxd-client
+++ b/scripts/bash/lxd-client
@@ -82,6 +82,7 @@ _have lxc && {
       limits.memory.swap limits.memory.swap.priority limits.network.priority \
       limits.processes linux.kernel_modules migration.incremental.memory \
       migration.incremental.memory.goal nvidia.runtime \
+      nvidia.driver.capabilities nvidia.require.cuda nvidia.require.driver \
       migration.incremental.memory.iterations raw.apparmor raw.idmap raw.lxc \
       raw.seccomp security.idmap.base security.idmap.isolated \
       security.idmap.size security.devlxd security.devlxd.images \
diff --git a/shared/container.go b/shared/container.go
index 5fb1d1ab9b..e7cb82dad1 100644
--- a/shared/container.go
+++ b/shared/container.go
@@ -206,7 +206,10 @@ var KnownContainerConfigKeys = map[string]func(value string) error{
 	"migration.incremental.memory.iterations": IsUint32,
 	"migration.incremental.memory.goal":       IsUint32,
 
-	"nvidia.runtime": IsBool,
+	"nvidia.runtime":             IsBool,
+	"nvidia.driver.capabilities": IsAny,
+	"nvidia.require.cuda":        IsAny,
+	"nvidia.require.driver":      IsAny,
 
 	"security.nesting":       IsBool,
 	"security.privileged":    IsBool,
diff --git a/shared/version/api.go b/shared/version/api.go
index 5e5f380823..e15f3f04c3 100644
--- a/shared/version/api.go
+++ b/shared/version/api.go
@@ -123,6 +123,7 @@ var APIExtensions = []string{
 	"candid_authentication",
 	"backup_compression",
 	"candid_config",
+	"nvidia_runtime_config",
 }
 
 // APIExtensionsCount returns the number of available API extensions.


More information about the lxc-devel mailing list