[lxc-devel] devices and containers

Serge Hallyn serge.hallyn at ubuntu.com
Tue Nov 4 09:50:09 UTC 2014


Hi,

Over the last few weeks several people have talked to me about
various issues relating to devices in containers.  So I thought
I'd send this out as a general survey of the work that I know of
being done relating in one way or another to devices in containers.
Different people have different goals, and several people are doing
their own thing to achieve their goals.  I wanted to get started by
having everyone being aware of what others are doing, in the hopes
that, over the next few years, we can work toward a comprehensive
solution.

So here goes.

Some people (mwarfield, Michael Coss) want to send uevents into
specific containers, i.e. consoles or X displays.  Michael Warfield
(AIUI) does this by moving devices into /dev/lxc/$container/.
At the containers track at plumbers a few weeks ago, Michael Coss
presented a solution developed at Lucent where uevents were sent only
to the initial netns, and a userspace daemon checks a database and
forwards uevents directly into containers so that containers can hotplug
as needed: http://www.linuxplumbersconf.org/2014/ocw/sessions/2157

Several people have wanted to use iscsi in containers.  AIUI Containers
(at least non-userns) can use iscsi devices if they are moved into
the containers namespace, however Clint was wanting to go further
and actually be able to create iscsi devices inside containers.
My memory may fail me, but I believe that to solve that we'd need
to extend the current netlink backend, which (IIRC) only accepts
connections from the initial netns.  More realistically, I'd envision
an answer to this being a userspace daemon on the host which the
containers can talk to to make requests.  OTOH it also feels similar to
the loop device namespacing issues which would be far more elegantly
solved in the kernel (imo).  Does anyone know of existing work to this
end (either way) for iscsi?

A few people have worked at the device driver level to actually
namespace the devices themselves.  For instance, Cellrox supports
switching the active display between multiple containers.  When c2
is the active container, its writes go to the real display.  When
it is not the active container, its writes are buffered.  This allows
the devices to be namespaced without any actual general "device namespace"
support in the kernel.  ISTR there was another company doing something
similar (I don't recall for which devices), but can't remember who/what
at the moment.

As I alluded to earlier, Seth had previously done a bit of work on
namespacing loop so that containers could create and use their own
loop devices.  For the moment that's been shelved and he's focused on
fuse inside containers instead.  However, at the kernel summit this
year Ted T'so said that at least mounting ext4 inside a container
"should" work, meaning that any issues (i.e. ability to corrupt the
supserblock reader with trash data by a malicious container admin)
would be considered a bug (rather than "don't do that" misuse).  I
hope to follow up on that with some simple tests, and of course
loopdev in containers will become far more compelling if we can
actually mount ext4 in a container (which we can't right now).

There's probably more and I aplogize to anyone who's work I neglected
to mention here.  I think we're at a point where collaboration would
be useful.

thanks,
-serge

PS - I certainly get some details wrong.  I'm gonna lie and claim that
I did so on purpose to encourage responses/discussion :)


More information about the lxc-devel mailing list