[lxc-users] Debian and unprivileged LXC not working...
Christian Brauner
christian.brauner at mailbox.org
Wed Dec 13 11:54:04 UTC 2017
On Tue, Dec 12, 2017 at 11:00:01PM -0600, Serge Hallyn wrote:
> On Tue, Dec 05, 2017 at 05:20:32PM +0100, Dirk Geschke wrote:
> > Hi Serge,
> >
> > > > I am a little bit clueless, I have several systems running with
> > > > Debian and unprivileged LXC. But newer systems won't start new
> > > > containers.
> > > >
> > > > Actually I have a Debian stretch, installed the normal way but
> > > > with lxc-2.0.9 and cgmanager-0.41 installed from sources.
> > > >
> > > > I can setup cgmanager, can do a cgm movepid and it is no problem
> > > > to download a template. But starting the container does not work,
> > > > it simply hungs at:
> > > >
> > > > $ lxc-start -n lxc-test -l trace -o wheezy -F
> > >
> > > I see no bad errors in the log. When this hangs, can you
> > > from another terminal see whether 'lxc-ls -f' shows it
> > > running, and what 'lxc-attach -n lxc-test' does?
> >
> > that's the funny part: Nothing. There is not one process from
> > the subuid range running. It simply hangs before it tries to
> > start the container at all. And I have no idea, why.
> > But with lxc-2.0.8 it works.
> >
> > I just installed and started debian wheezy, upgraded it to jessie
> > and finally to stretch. It works fine.
> >
> > I now installed lxc-2.0.9 again, tried to start the container again
> > and nothing happens:
> >
> > $ lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> >
> > That's all. lxc-ls -f and lxc-attach-n lxc-test hangs, too.
> >
> > I see also three processes of lxc-start:
> >
> > $ ps waux |grep lxc-start
> > lxc-test 24478 0.0 0.1 51740 4232 pts/0 S+ 17:16 0:00
> > lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > lxc-test 24487 0.0 0.0 51740 504 pts/0 S+ 17:16 0:00
> > lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
> > lxc-test 24492 0.0 0.0 51740 508 pts/0 S+ 17:16 0:00
> > lxc-start -n lxc-test -l trace -o stretch-lxc-2.0.9 -F
>
> When you gdb-attach to these (which you have to do as root for two
> of them) you find that you're hung on:
>
> (gdb) where
> #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1 0x00007f2a94f68b95 in __GI___pthread_mutex_lock (mutex=mutex at entry=0x7f2a95868a60 <cgm_mutex>)
> at ../nptl/pthread_mutex_lock.c:80
> #2 0x00007f2a95638c4d in lock_mutex (l=0x7f2a95868a60 <cgm_mutex>) at cgroups/cgmanager.c:80
> #3 cgm_lock () at cgroups/cgmanager.c:98
> #4 0x00007f2a94a722f5 in __libc_fork () at ../sysdeps/nptl/fork.c:96
> #5 0x00007f2a95604ee7 in run_command (buf=buf at entry=0x7fff3b5a20e0 "",
> buf_size=buf_size at entry=4096,
> child_fn=child_fn at entry=0x7f2a95606a30 <lxc_map_ids_exec_wrapper>,
> args=args at entry=0x7fff3b5a40e0) at utils.c:2262
> #6 0x00007f2a9560b01e in lxc_map_ids (idmap=idmap at entry=0x55b48a62c1c0, pid=pid at entry=15389)
> at conf.c:2652
> #7 0x00007f2a9560f1e5 in userns_exec_1 (conf=conf at entry=0x55b48a625b90,
> fn=fn at entry=0x7f2a95639a20 <chown_cgroup_wrapper>, data=data at entry=0x7fff3b5a5210,
> fn_name=fn_name at entry=0x7f2a9563fadc "chown_cgroup_wrapper") at conf.c:3822
> #8 0x00007f2a9563a1a9 in chown_cgroup (conf=0x55b48a625b90, cgroup_path=<optimized out>)
> at cgroups/cgmanager.c:500
> #9 cgm_chown (hdata=0x55b48a62b1d0, conf=0x55b48a625b90) at cgroups/cgmanager.c:1555
> #10 0x00007f2a955fa397 in lxc_spawn (handler=0x55b48a624e50) at start.c:1363
>
> and
>
> #0 0x00007f2a94f6f1f0 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:84
> #1 0x00007f2a95607872 in run_userns_fn (data=0x7fff3b5a51b0) at conf.c:3570
> #2 0x00007f2a94aa2aff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>
> So it seems to be hanging on cgm_lock().
>
> I'm a bit too tired to think it through clearly enough, but I'm thinking
> this might have to do with the introduction of run_command(). It introduces
> an extra fork() between the clone(CLONE_NEWUSER)'d thread and the task which
> actually does the work. Perhaps that is messing with our lock state?
Right, so as I said, this could be related to pthread_atfork() handlers.
(I suspect that cgmanager is multi-threaded or calls to libnh or dbus
which is, Serge?)
Older liblxc version used system() instead of run_command(). For
system() POSIX leaves it unspecified whether pthread_atfork() handlers
are called but glibc's implementation of system() guarantees that they
are not. But there's no requirement. So this might be why we have been
fine - by chance - all of the time. The obvious solution is to switch
back to system() instead of run_command() but let me think about this
for a second.
More information about the lxc-users
mailing list