[lxc-users] Running docker inside unprivileged LXC containers

Wed Jun 10 23:00:07 UTC 2015

Great! Thanks for the inputs. I'm going to write to the docker folks if
they plan to support running containers which could ask for devices from
the host or some service instead of just failing to run the container.
Could cgmanager help in managing cgroups for docker?
Serge Hallyn <serge.hallyn at ubuntu.com> wrote:

> Quoting Akshay Karle (akshay.a.karle at gmail.com):
> > Hello,
> >
> > I'm currently working on a project that requires to run docker
> > containers inside unprivileged LXC containers. I've managed to run
> > unprivileged containers on an Ubuntu 14.04 host. I've also managed to
> > get the docker daemon running using the LXC driver instead of native
> > docker exec driver. Right now I'm stuck when trying to start a docker
> > container as it attempts to create special devices which fails as it
> > doesn't have the permissions to do so in the unprivileged container.

> You'll need to coordinate between the container and the host to create the
> devices.  This is something I do want to think about, but have not yet had
> time to do so.  It may involve updating Docker to use a service, when
> available, to request devices be created.  This could be a dbus service
> which gets (vetted and) passed through to the host.

I've thought about this a bit too, as it's the same problem I'm facing
(although in my case, there's very little software in the host or the
container, just a pretty minimal busybox plus a couple of applications, so
anything based around requiring dbus or systemd is useless for my purposes)

I'm attempting to start an unprivileged container and populate the devices
using an autodev hook, but that doesn't work, because the user namespace has
already been changed.  So I'm stuck with having to bind mount all the
devices individually, which would be great - except that the device nodes
don't all exist in the host, so I'm having to create them in the host in
advance of starting the containers.

Could lxc-start create the device nodes before the user namespace is
changed?  It'd have to apply the uid_map and gid_map manually, but that
might be doable.

Of course, once the container is running, you can use lxc-device to create
the devices inside the container, but that's no use if you need the devices
early on.  You can't do this in the start hook, because you need lxc-start
to release all its locks before you can run lxc-device.

I considered changing lxc-start so it cached a thread that would remain
authorised to do the mknod() calls and could be called upon as necessary,
but didn't actually try it yet.  Perhaps that's worth looking into?

An alternative idea I had was to not run /sbin/init in the container but
instead run a shell script that communicates its readiness state to the host
and then waits for an indication that it is safe to continue, at which point
it would exec /sbin/init.  Meanwhile the process in the host would be
waiting for the ready indication and run lxc-device as required and then
send back the indication to the container to continue.  I'm confident that
this will work, as I've done this sort of thing before, and I already have a
signalling mechanism working so that I know when the container's init has
finished running all the sysinit tasks from /etc/inittab and thus the
applications are ready.  It would be quite easy to adapt, but it's one of
those "neat, but really ugly" kind of scripts - a one-liner, based around
inotifywait.  The big advantage is that this will work without any changes
to lxc at all.  The main disadvantage is that it's ugly.

Another alternative could be to have the process that is the new PID 1 in
the container to SIGSTOP itself just before the execve, then lxc-start and a
new "post-start" hook could collude to run the hook without lxc-start
holding any of the locks.  This sounds incredibly messy though, not to
mention failure-prone.

However, another far neater way of doing this could be to use the freezer
instead.  Just give lxc-start a new command-line option to start the
container *but* crucially, leave it frozen when lxc-start exits.  The caller
can then just do lxc-start, lxc-device, lxc-unfreeze.  This would seem to be
the least invasive way of doing it, and stands a good chance of working
reliably, I would have thought, as long as you can execute the freeze at the
right point (just before the execve of the new PID 1) and as long as
lxc-device works on a frozen container (does it?  I can't get to my dev box
right now to test it)

--
Stewart Brodie
Senior Software Engineer
Espial UK
_______________________________________________
lxc-users mailing list
lxc-users at lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20150610/5d2a8807/attachment.html>