[lxc-users] Running docker inside unprivileged LXC containers

Mon Jun 15 15:22:32 UTC 2015

Quoting Stewart Brodie (sbrodie at espial.com):
> Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> 
> > Quoting Akshay Karle (akshay.a.karle at gmail.com):
> > > Hello,
> > > 
> > > I'm currently working on a project that requires to run docker
> > > containers inside unprivileged LXC containers. I've managed to run
> > > unprivileged containers on an Ubuntu 14.04 host. I've also managed to
> > > get the docker daemon running using the LXC driver instead of native
> > > docker exec driver. Right now I'm stuck when trying to start a docker
> > > container as it attempts to create special devices which fails as it
> > > doesn't have the permissions to do so in the unprivileged container.
> 
> > You'll need to coordinate between the container and the host to create the
> > devices.  This is something I do want to think about, but have not yet had
> > time to do so.  It may involve updating Docker to use a service, when
> > available, to request devices be created.  This could be a dbus service
> > which gets (vetted and) passed through to the host.
> 
> I've thought about this a bit too, as it's the same problem I'm facing
> (although in my case, there's very little software in the host or the
> container, just a pretty minimal busybox plus a couple of applications, so
> anything based around requiring dbus or systemd is useless for my purposes)
> 
> I'm attempting to start an unprivileged container and populate the devices
> using an autodev hook, but that doesn't work, because the user namespace has
> already been changed.  So I'm stuck with having to bind mount all the
> devices individually, which would be great - except that the device nodes
> don't all exist in the host, so I'm having to create them in the host in
> advance of starting the containers.
>
> Could lxc-start create the device nodes before the user namespace is
> changed?  It'd have to apply the uid_map and gid_map manually, but that
> might be doable.

If you are starting this container as root then you can use an
lxc.hook.pre-start hook to create and chown the devices.

> Of course, once the container is running, you can use lxc-device to create
> the devices inside the container, but that's no use if you need the devices
> early on.  You can't do this in the start hook, because you need lxc-start
> to release all its locks before you can run lxc-device.
> 
> I considered changing lxc-start so it cached a thread that would remain
> authorised to do the mknod() calls and could be called upon as necessary,
> but didn't actually try it yet.  Perhaps that's worth looking into?
> 
> An alternative idea I had was to not run /sbin/init in the container but
> instead run a shell script that communicates its readiness state to the host
> and then waits for an indication that it is safe to continue, at which point
> it would exec /sbin/init.  Meanwhile the process in the host would be
> waiting for the ready indication and run lxc-device as required and then
> send back the indication to the container to continue.  I'm confident that
> this will work, as I've done this sort of thing before, and I already have a
> signalling mechanism working so that I know when the container's init has
> finished running all the sysinit tasks from /etc/inittab and thus the
> applications are ready.  It would be quite easy to adapt, but it's one of
> those "neat, but really ugly" kind of scripts - a one-liner, based around
> inotifywait.  The big advantage is that this will work without any changes
> to lxc at all.  The main disadvantage is that it's ugly.
> 
> Another alternative could be to have the process that is the new PID 1 in
> the container to SIGSTOP itself just before the execve, then lxc-start and a
> new "post-start" hook could collude to run the hook without lxc-start
> holding any of the locks.  This sounds incredibly messy though, not to
> mention failure-prone.
> 
> However, another far neater way of doing this could be to use the freezer
> instead.  Just give lxc-start a new command-line option to start the
> container *but* crucially, leave it frozen when lxc-start exits.  The caller
> can then just do lxc-start, lxc-device, lxc-unfreeze.  This would seem to be
> the least invasive way of doing it, and stands a good chance of working
> reliably, I would have thought, as long as you can execute the freeze at the
> right point (just before the execve of the new PID 1) and as long as
> lxc-device works on a frozen container (does it?  I can't get to my dev box
> right now to test it)
> 
> 
> -- 
> Stewart Brodie
> Senior Software Engineer
> Espial UK
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users