<p dir="ltr">Great! Thanks for the inputs. I'm going to write to the docker folks if they plan to support running containers which could ask for devices from the host or some service instead of just failing to run the container. Could cgmanager help in managing cgroups for docker?</p>

<div class="gmail_quot<blockquote class=" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Serge Hallyn <<a href="mailto:serge.hallyn@ubuntu.com">serge.hallyn@ubuntu.com</a>> wrote:<br>

<br>

> Quoting Akshay Karle (<a href="mailto:akshay.a.karle@gmail.com">akshay.a.karle@gmail.com</a>):<br>

> > Hello,<br>

> ><br>

> > I'm currently working on a project that requires to run docker<br>

> > containers inside unprivileged LXC containers. I've managed to run<br>

> > unprivileged containers on an Ubuntu 14.04 host. I've also managed to<br>

> > get the docker daemon running using the LXC driver instead of native<br>

> > docker exec driver. Right now I'm stuck when trying to start a docker<br>

> > container as it attempts to create special devices which fails as it<br>

> > doesn't have the permissions to do so in the unprivileged container.<br>

<br>

> You'll need to coordinate between the container and the host to create the<br>

> devices.  This is something I do want to think about, but have not yet had<br>

> time to do so.  It may involve updating Docker to use a service, when<br>

> available, to request devices be created.  This could be a dbus service<br>

> which gets (vetted and) passed through to the host.<br>

<br>

I've thought about this a bit too, as it's the same problem I'm facing<br>

(although in my case, there's very little software in the host or the<br>

container, just a pretty minimal busybox plus a couple of applications, so<br>

anything based around requiring dbus or systemd is useless for my purposes)<br>

<br>

I'm attempting to start an unprivileged container and populate the devices<br>

using an autodev hook, but that doesn't work, because the user namespace has<br>

already been changed.  So I'm stuck with having to bind mount all the<br>

devices individually, which would be great - except that the device nodes<br>

don't all exist in the host, so I'm having to create them in the host in<br>

advance of starting the containers.<br>

<br>

Could lxc-start create the device nodes before the user namespace is<br>

changed?  It'd have to apply the uid_map and gid_map manually, but that<br>

might be doable.<br>

<br>

Of course, once the container is running, you can use lxc-device to create<br>

the devices inside the container, but that's no use if you need the devices<br>

early on.  You can't do this in the start hook, because you need lxc-start<br>

to release all its locks before you can run lxc-device.<br>

<br>

I considered changing lxc-start so it cached a thread that would remain<br>

authorised to do the mknod() calls and could be called upon as necessary,<br>

but didn't actually try it yet.  Perhaps that's worth looking into?<br>

<br>

An alternative idea I had was to not run /sbin/init in the container but<br>

instead run a shell script that communicates its readiness state to the host<br>

and then waits for an indication that it is safe to continue, at which point<br>

it would exec /sbin/init.  Meanwhile the process in the host would be<br>

waiting for the ready indication and run lxc-device as required and then<br>

send back the indication to the container to continue.  I'm confident that<br>

this will work, as I've done this sort of thing before, and I already have a<br>

signalling mechanism working so that I know when the container's init has<br>

finished running all the sysinit tasks from /etc/inittab and thus the<br>

applications are ready.  It would be quite easy to adapt, but it's one of<br>

those "neat, but really ugly" kind of scripts - a one-liner, based around<br>

inotifywait.  The big advantage is that this will work without any changes<br>

to lxc at all.  The main disadvantage is that it's ugly.<br>

<br>

Another alternative could be to have the process that is the new PID 1 in<br>

the container to SIGSTOP itself just before the execve, then lxc-start and a<br>

new "post-start" hook could collude to run the hook without lxc-start<br>

holding any of the locks.  This sounds incredibly messy though, not to<br>

mention failure-prone.<br>

<br>

However, another far neater way of doing this could be to use the freezer<br>

instead.  Just give lxc-start a new command-line option to start the<br>

container *but* crucially, leave it frozen when lxc-start exits.  The caller<br>

can then just do lxc-start, lxc-device, lxc-unfreeze.  This would seem to be<br>

the least invasive way of doing it, and stands a good chance of working<br>

reliably, I would have thought, as long as you can execute the freeze at the<br>

right point (just before the execve of the new PID 1) and as long as<br>

lxc-device works on a frozen container (does it?  I can't get to my dev box<br>

right now to test it)<br>

<br>

<br>

--<br>

Stewart Brodie<br>

Senior Software Engineer<br>

Espial UK<br>

_______________________________________________<br>

lxc-users mailing list<br>

<a href="mailto:lxc-users@lists.linuxcontainers.org">lxc-users@lists.linuxcontainers.org</a><br>

<a href="http://lists.linuxcontainers.org/listinfo/lxc-users" target="_blank">http://lists.linuxcontainers.org/listinfo/lxc-users</a></div>