[lxc-users] Debian and unprivileged LXC not working...

Wed Dec 13 05:00:01 UTC 2017

On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> Hi Serge,
> 
> > > I am a little bit clueless, I have several systems running with
> > > Debian and unprivileged LXC. But newer systems won't start new
> > > containers.
> > > 
> > > Actually I have a Debian stretch, installed the normal way but
> > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > 
> > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > to download a template. But starting the container does not work,
> > > it simply hungs at:
> > > 
> > >    $ lxc-start -n lxc-test -l trace -o wheezy -F
> > 
> > I see no bad errors in the log.  When this hangs, can you
> > from another terminal see whether 'lxc-ls -f' shows it
> > running, and what 'lxc-attach -n lxc-test' does?
> 
> that's the funny part: Nothing. There is not one process from 
> the subuid range running. It simply hangs before it tries to 
> start the container at all. And I have no idea, why.
> But with lxc-2.0.8 it works. 
> 
> I just installed and started debian wheezy, upgraded it to jessie
> and finally to stretch. It works fine.
> 
> I now installed lxc-2.0.9 again, tried to start the container again
> and nothing happens:
> 
>    $ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> 
> That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> 
> I see also three processes of lxc-start:
> 
>    $ ps waux |grep lxc-start
>    lxc-test 24478  0.0  0.1  51740  4232 pts/0    S+   17:16   0:00
>    lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
>    lxc-test 24487  0.0  0.0  51740   504 pts/0    S+   17:16   0:00
>    lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
>    lxc-test 24492  0.0  0.0  51740   508 pts/0    S+   17:16   0:00
>    lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F

When you gdb-attach to these (which you have to do as root for two
of them) you find that you're hung on:

(gdb) where
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f2a94f68b95 in __GI___pthread_mutex_lock (mutex=mutex at entry=0x7f2a95868a60 <cgm_mutex>)
    at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f2a95638c4d in lock_mutex (l=0x7f2a95868a60 <cgm_mutex>) at cgroups/cgmanager.c:80
#3  cgm_lock () at cgroups/cgmanager.c:98
#4  0x00007f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
#5  0x00007f2a95604ee7 in run_command (buf=buf at entry=0x7fff3b5a20e0 "",
    buf_size=buf_size at entry=4096,
    child_fn=child_fn at entry=0x7f2a95606a30 <lxc_map_ids_exec_wrapper>,
    args=args at entry=0x7fff3b5a40e0) at utils.c:2262
#6  0x00007f2a9560b01e in lxc_map_ids (idmap=idmap at entry=0x55b48a62c1c0, pid=pid at entry=15389)
    at conf.c:2652
#7  0x00007f2a9560f1e5 in userns_exec_1 (conf=conf at entry=0x55b48a625b90,
    fn=fn at entry=0x7f2a95639a20 <chown_cgroup_wrapper>, data=data at entry=0x7fff3b5a5210,
    fn_name=fn_name at entry=0x7f2a9563fadc "chown_cgroup_wrapper") at conf.c:3822
#8  0x00007f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, cgroup_path=<optimized out>)
    at cgroups/cgmanager.c:500
#9  cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at cgroups/cgmanager.c:1555
#10 0x00007f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at start.c:1363

and

#0  0x00007f2a94f6f1f0 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at conf.c:3570
#2  0x00007f2a94aa2aff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

So it seems to be hanging on cgm_lock().

I'm a bit too tired to think it through clearly enough, but I'm thinking
this might have to do with the introduction of run_command().  It introduces
an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task which
actually does the work.  Perhaps that is messing with our lock state?

-serge