[lxc-devel] RFC: Device Namespaces

Serge Hallyn serge.hallyn at ubuntu.com
Wed Sep 25 20:13:56 UTC 2013


Quoting Michael J Coss (michael.coss at alcatel-lucent.com):
> I've been looking at this problem for some time to help solve my very 
> specific use case.   In our case we are using containers to provide 
> individual "desktops" to a number of users.  We want the desktop to run 
> X, and bind and unbind a display, keyboard, mouse to that X server 
> running in a particular container, and not be able to grab anyone elses 
> keyboard, mouse or display unless granted specific access to that from 
> the owern.  To that end, I started worked on a udev solution.  I 
> understand that most containers don't/won't run udev.  And systemd won't 
> even start udev if the container doesn't have the mknod capability which 
> is a kinda odd cookie but I digress.
> 
> Currently the kernel effectively broadcasts uevents to all network 
> namespaces, and this is an issue.  I don't want container A to see 
> container B's events.  It should see only what the admin has set for the 
> policy for that container.  This policy should be handled on the host 
> for the containers in userspace.  This deamon can get the events, and 
> then forward to the appropriate container(s) those events that are 
> pertinent, and disregard the rest.  To accomplish this, I had to change 
> the broadcast mechanism, and then provide a forwarding mechanism to 
> specific network namespaces.
> 
> Back in the day, that would have been sufficient.  Udev running in the 
> container would have gotten the add event, and created the appropriate 
> devices and symlinks, and then cleaned up on remove/change events.  With 
> the introduction of devtmpfs, udev no longer actually creates the device 
> nodes.  It just handles links and name changes.   So, I'm still left 
> with needing to create/manage devtmpfs or some other solution.  This 
> leads me down the path of virtualizing devtmpfs.  I've been fooling 
> around with FUSE, to basically mirror the host /dev (filtered 

Rather than using FUSE, I'd recommend looking into doing it the same
way as the devpts fs.  Might not work out (or be rejected) in the end,
but at first glance it seems the right way to handle it.  So each new
instance mount starts empty, changes in one are not reflected in
another, but new devices which the kernel later creates may (subject
to device cgroup of the process which mounted it?) be created in the
new instances.

> appropriately), but there are many ugly security, and implementation 
> details that look bad to me.  I have been kicking around the notion that 
> the device cgroup might provide the list of "acceptable" devices, and 
> construct a filter/view for devtmpfs based on that.
> 
> I do have these changes working on a mostly stock 3.10 kernel,  the 
> kernel changes are pretty small, and the deamon does a pretty minimal 
> filtering mostly to demonstrate functionality.  It does assume that the 
> containers are running in a separate network namespace, but that's about it.
> 
> Of course, that still leaves you with sysfs needing similar treatment.
> 
> ---Michael J Coss
> 
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk
> _______________________________________________
> Lxc-devel mailing list
> Lxc-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-devel




More information about the lxc-devel mailing list