[lxc-devel] RFC: Device Namespaces
Michael J Coss
michael.coss at alcatel-lucent.com
Wed Sep 25 18:07:31 UTC 2013
I've been looking at this problem for some time to help solve my very
specific use case. In our case we are using containers to provide
individual "desktops" to a number of users. We want the desktop to run
X, and bind and unbind a display, keyboard, mouse to that X server
running in a particular container, and not be able to grab anyone elses
keyboard, mouse or display unless granted specific access to that from
the owern. To that end, I started worked on a udev solution. I
understand that most containers don't/won't run udev. And systemd won't
even start udev if the container doesn't have the mknod capability which
is a kinda odd cookie but I digress.
Currently the kernel effectively broadcasts uevents to all network
namespaces, and this is an issue. I don't want container A to see
container B's events. It should see only what the admin has set for the
policy for that container. This policy should be handled on the host
for the containers in userspace. This deamon can get the events, and
then forward to the appropriate container(s) those events that are
pertinent, and disregard the rest. To accomplish this, I had to change
the broadcast mechanism, and then provide a forwarding mechanism to
specific network namespaces.
Back in the day, that would have been sufficient. Udev running in the
container would have gotten the add event, and created the appropriate
devices and symlinks, and then cleaned up on remove/change events. With
the introduction of devtmpfs, udev no longer actually creates the device
nodes. It just handles links and name changes. So, I'm still left
with needing to create/manage devtmpfs or some other solution. This
leads me down the path of virtualizing devtmpfs. I've been fooling
around with FUSE, to basically mirror the host /dev (filtered
appropriately), but there are many ugly security, and implementation
details that look bad to me. I have been kicking around the notion that
the device cgroup might provide the list of "acceptable" devices, and
construct a filter/view for devtmpfs based on that.
I do have these changes working on a mostly stock 3.10 kernel, the
kernel changes are pretty small, and the deamon does a pretty minimal
filtering mostly to demonstrate functionality. It does assume that the
containers are running in a separate network namespace, but that's about it.
Of course, that still leaves you with sysfs needing similar treatment.
---Michael J Coss
More information about the lxc-devel
mailing list