[lxc-users] Debian and unprivileged LXC not working...

Wed Dec 13 14:12:41 UTC 2017

On Wed, Dec 13, 2017 at 01:35:01PM +0100, Christian Brauner wrote:
> On Wed, Dec 13, 2017 at 12:54:04PM +0100, Christian Brauner wrote:
> > On Tue, Dec 12, 2017 at 11:00:01PM -0600, Serge Hallyn wrote:
> > > On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> > > > Hi Serge,
> > > > 
> > > > > > I am a little bit clueless, I have several systems running with
> > > > > > Debian and unprivileged LXC. But newer systems won't start new
> > > > > > containers.
> > > > > > 
> > > > > > Actually I have a Debian stretch, installed the normal way but
> > > > > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > > > > 
> > > > > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > > > > to download a template. But starting the container does not work,
> > > > > > it simply hungs at:
> > > > > > 
> > > > > >    $ lxc-start -n lxc-test -l trace -o wheezy -F
> > > > > 
> > > > > I see no bad errors in the log.  When this hangs, can you
> > > > > from another terminal see whether 'lxc-ls -f' shows it
> > > > > running, and what 'lxc-attach -n lxc-test' does?
> > > > 
> > > > that's the funny part: Nothing. There is not one process from 
> > > > the subuid range running. It simply hangs before it tries to 
> > > > start the container at all. And I have no idea, why.
> > > > But with lxc-2.0.8 it works. 
> > > > 
> > > > I just installed and started debian wheezy, upgraded it to jessie
> > > > and finally to stretch. It works fine.
> > > > 
> > > > I now installed lxc-2.0.9 again, tried to start the container again
> > > > and nothing happens:
> > > > 
> > > >    $ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > > 
> > > > That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> > > > 
> > > > I see also three processes of lxc-start:
> > > > 
> > > >    $ ps waux |grep lxc-start
> > > >    lxc-test 24478  0.0  0.1  51740  4232 pts/0    S+   17:16   0:00
> > > >    lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > >    lxc-test 24487  0.0  0.0  51740   504 pts/0    S+   17:16   0:00
> > > >    lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > >    lxc-test 24492  0.0  0.0  51740   508 pts/0    S+   17:16   0:00
> > > >    lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > > 
> > > When you gdb-attach to these (which you have to do as root for two
> > > of them) you find that you're hung on:
> > > 
> > > (gdb) where
> > > #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > > #1  0x00007f2a94f68b95 in __GI___pthread_mutex_lock (mutex=mutex at entry=0x7f2a95868a60 <cgm_mutex>)
> > >     at ../nptl/pthread_mutex_lock.c:80
> > > #2  0x00007f2a95638c4d in lock_mutex (l=0x7f2a95868a60 <cgm_mutex>) at cgroups/cgmanager.c:80
> > > #3  cgm_lock () at cgroups/cgmanager.c:98
> > > #4  0x00007f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
> > > #5  0x00007f2a95604ee7 in run_command (buf=buf at entry=0x7fff3b5a20e0 "",
> > >     buf_size=buf_size at entry=4096,
> > >     child_fn=child_fn at entry=0x7f2a95606a30 <lxc_map_ids_exec_wrapper>,
> > >     args=args at entry=0x7fff3b5a40e0) at utils.c:2262
> > > #6  0x00007f2a9560b01e in lxc_map_ids (idmap=idmap at entry=0x55b48a62c1c0, pid=pid at entry=15389)
> > >     at conf.c:2652
> > > #7  0x00007f2a9560f1e5 in userns_exec_1 (conf=conf at entry=0x55b48a625b90,
> > >     fn=fn at entry=0x7f2a95639a20 <chown_cgroup_wrapper>, data=data at entry=0x7fff3b5a5210,
> > >     fn_name=fn_name at entry=0x7f2a9563fadc "chown_cgroup_wrapper") at conf.c:3822
> > > #8  0x00007f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, cgroup_path=<optimized out>)
> > >     at cgroups/cgmanager.c:500
> > > #9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at cgroups/cgmanager.c:1555
> > > #10 0x00007f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at start.c:1363
> > > 
> > > and
> > > 
> > > #0  0x00007f2a94f6f1f0 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
> > > #1  0x00007f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at conf.c:3570
> > > #2  0x00007f2a94aa2aff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
> > > 
> > > So it seems to be hanging on cgm_lock().
> > > 
> > > I'm a bit too tired to think it through clearly enough, but I'm thinking
> > > this might have to do with the introduction of run_command().  It introduces
> > > an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task which
> > > actually does the work.  Perhaps that is messing with our lock state?
> > 
> > Right, so as I said, this could be related to pthread_atfork() handlers.
> > (I suspect that cgmanager is multi-threaded or calls to libnh or dbus
> > which is, Serge?)
> > Older liblxc version used system() instead of run_command(). For
> > system() POSIX leaves it unspecified whether pthread_atfork() handlers
> > are called but glibc's implementation of system() guarantees that they
> > are not. But there's no requirement. So this might be why we have been
> > fine - by chance - all of the time. The obvious solution is to switch
> > back to system() instead of run_command() but let me think about this
> > for a second.
> 
> Right, so I think that this is indeed pthread_atfork() and cgmanager:
> 1. cgm_chown() calls cgm_dbus_connect()
> 2. cgm_dbus_connect() calls cgm_lock():
>    Now the thread holds an explicit lock on the mutex
> 3. cgm_chown() calls chown_cgroup()
> 4. chown_cgroup() calls userns_exec_1()
> 5. userns_exec_1() forks with an explicit lock on the mutex being held
>    in another thread
> 6. pthread_atfork() handlers get run including the prepare() handler:
> 
> 	#ifdef HAVE_PTHREAD_ATFORK __attribute__((constructor))
> 	static void process_lock_setup_atfork(void)
> 	{
> 		pthread_atfork(cgm_lock, cgm_unlock, cgm_unlock);
> 	}
> 	#endif
> 
>    thus trying to acquire the mutex that is being explicitly held in the
>    parent. If we were using recursive locks then the parent would now
>    hold two locks but since I don't see us using them I guess we're
>    simply getting undefined behavior.
> 
> So one other - and probably cleaner/safer solution - is to see if we can
> temporarly disable the prepare handle for pthread_atfork() and reenable
> it right after the call. Though I'm not sure about the implications atm.

That's actually not possible. The closest to achieve this is to do the
syscall directly without using glibc's fork() that runs the
pthread_atfork() handlers.