[lxc-devel] Dynamic devices

Mon Mar 11 20:49:22 UTC 2013

I know that this has probably been hashed over dozens of time but as far
as I can tell udev still does not work properly in containers, neither
OpenVZ nor LXC variants.  Unfortunately, I really do have a need for
dynamic devices, specifically some USB devices tunnelled over IP.  I'm
looking at using LXC as the basis for a cloud computing infrastructure,
where user & device migration needs to allow for a user to start up a
container and "bind" an arbitrary set of devices to that container (as
hotplug events).  I like LXC for it's low overhead, and apparent
upstream support, and had hoped that with the introduction of the
various namespaces (user, network, etc.) that a udev solution would have
arisen.

Of course, most google searches on this topic, say "udev doesn't work",
"don't install it", or "remove it from init.d", etc.  

On the one hand, there is the case to be made that containers should
simply be pre-provisioned with all devices that are needed by that
container.  However, I can imagine a operating model, where the host
system applies a policy on the kobject events, and forwards them to the
appropriate container.  Alternatively, the host could apply the policy,
but instead of forwarding the events to the container, it could simply
process the events from the "host" side, modifying the dev directory for
the appropriate container.  Or events could be sent to all containers,
and the policy as specified in the lxc configuration would limit what
devices can or can not be created. Or finally, the policy could be
applied in the kernel where the event broadcast occurs.

I discount the last as this kind of security policy really doesn't, in
my opinion, belong in the kernel. But at the end of the day, I really
would like not be constantly fighting dependencies on having
udev/systemd installed in the containers, as well as supporting dynamic
devices.

When I first looked at the kernel, it looked to me like there is some
attempt at isolation of the uevents.  There is a bcast filter function
which looks like it would only forward events to listening processes
(udevd) if they were in the same network namespace.  But experimentally
this does not seem to be the case.  On my setup, running a Gentoo host,
and a Gentoo container, if I run udevadm monitor on both host and
container, I see the kobject uevents on both systems.  What happens
after that is a bit of a mystery, as I haven't delved into why the
container udev doesn't seem to properly process the event.

Each approach seems to have some merit.  If you forwarded the events to
the containers udevd via netlink socket, then the container's udevd
could in theory be the stock udev, no modifications necessary.  Only the
host's udev would need to be modified.

Going down the other path, you'd need to ensure that no kernel uevents
are ever forwarded to a container namespace.  This can be done via a bit
of a hack although I'm still curious why events leak to the containers
when they are in separate namespaces.

Before I go too much further down this road I was wondering what the
current consensus is and figured this was the place to ask.  So
thoughts?  comments?

---Michael J Coss