[lxc-devel] RFC: Device Namespaces

Michael J Coss michael.coss at alcatel-lucent.com
Wed Sep 25 18:07:31 UTC 2013


I've been looking at this problem for some time to help solve my very 
specific use case.   In our case we are using containers to provide 
individual "desktops" to a number of users.  We want the desktop to run 
X, and bind and unbind a display, keyboard, mouse to that X server 
running in a particular container, and not be able to grab anyone elses 
keyboard, mouse or display unless granted specific access to that from 
the owern.  To that end, I started worked on a udev solution.  I 
understand that most containers don't/won't run udev.  And systemd won't 
even start udev if the container doesn't have the mknod capability which 
is a kinda odd cookie but I digress.

Currently the kernel effectively broadcasts uevents to all network 
namespaces, and this is an issue.  I don't want container A to see 
container B's events.  It should see only what the admin has set for the 
policy for that container.  This policy should be handled on the host 
for the containers in userspace.  This deamon can get the events, and 
then forward to the appropriate container(s) those events that are 
pertinent, and disregard the rest.  To accomplish this, I had to change 
the broadcast mechanism, and then provide a forwarding mechanism to 
specific network namespaces.

Back in the day, that would have been sufficient.  Udev running in the 
container would have gotten the add event, and created the appropriate 
devices and symlinks, and then cleaned up on remove/change events.  With 
the introduction of devtmpfs, udev no longer actually creates the device 
nodes.  It just handles links and name changes.   So, I'm still left 
with needing to create/manage devtmpfs or some other solution.  This 
leads me down the path of virtualizing devtmpfs.  I've been fooling 
around with FUSE, to basically mirror the host /dev (filtered 
appropriately), but there are many ugly security, and implementation 
details that look bad to me.  I have been kicking around the notion that 
the device cgroup might provide the list of "acceptable" devices, and 
construct a filter/view for devtmpfs based on that.

I do have these changes working on a mostly stock 3.10 kernel,  the 
kernel changes are pretty small, and the deamon does a pretty minimal 
filtering mostly to demonstrate functionality.  It does assume that the 
containers are running in a separate network namespace, but that's about it.

Of course, that still leaves you with sysfs needing similar treatment.

---Michael J Coss




More information about the lxc-devel mailing list