[lxc-users] Debian and unprivileged LXC not working...

Thu Dec 14 01:53:03 UTC 2017

> You can see whether lxc-2.1.1 fixes it for you, or

It won't.

> you can run wigh cgfsng instead of cgmanager, as your
> problem is just with the cgm_lock.
> 
> > Can I help in any way?

I've appened a patch to this mail which I think solves your problem. If
you could apply it and test that would be amazing since I don't have
cgmanager here.

>
> If you were feeling bored and/or industrious, you could
> grab the lxc git tree and git bisect to the commit that
> breaks it :)  I'm 99% sure it'll point to the commit that
> introduces run_command(), but actually it's possible that
> I am actually wrong about that, so confirmation would be
> useful.

Yes, run_command() is the cause. It is caused by pthread_atfork()
handlers.

> 
> Or instead of a bisect, you could just revert ea3a694fe
> in the 2.0.9 tree and see if that fixes it.  Though it
> may not revert cleanly.
> 
> But, you've been enormously helpful in finding this.  While
> it currently only affects a configuration which isn't much
> used any more, if we're right about the cause then there is
> a more general underlying problem which can strike elsewhere
> too.  So thanks!

Essentially, run_command() is called in contexts where threads
explicitly hold a lock while fork()ing. Currently, this just affects the
legacy cgmanager cgroup driver.  Here's what's happening when we use
fork():

1. cgm_chown() calls cgm_dbus_connect()
2. cgm_dbus_connect() calls cgm_lock():
   Now the thread holds an explicit lock on the mutex
3. cgm_chown() calls chown_cgroup()
4. chown_cgroup() calls userns_exec_1()
5. userns_exec_1() forks with an explicit lock on the mutex being held
6. pthread_atfork() handlers get run including the prepare() handler:

        #ifdef HAVE_PTHREAD_ATFORK __attribute__((constructor))
        static void process_lock_setup_atfork(void)
        {
                pthread_atfork(cgm_lock, cgm_unlock, cgm_unlock);
        }
        #endif

   thus trying to acquire the mutex that is being explicitly held in the
   parent. If we were using recursive locks then the parent would now
   hold two locks but since I don't see us using them I guess we're
   simply getting undefined behavior.

There are multiple ways to solve this problem. They are all not very nice. One
solution is to use interposition wrapper for pthread_atfork() but that is
rather tricky since we need to have wrappers for the pthread_atfork() callbacks
and need to identify our caller so that we can make a decision whether we
should execute the callback or not. If this were a generic problem I'd say we
go for this solution but as this only affects the legacy cgmanager driver we
don't really care and I'd much rather enforce that any future code does not
take an explicit lock during a fork(). That sounds like a bad idea in the first
place. So simply switch from using fork() to clone() which does not run
pthread_atfork() handlers. If push comes to shove we might just go for doing
the clone() syscall directly via syscall(SYS_clone, ...).

Serge, please take a look at https://github.com/lxc/lxc/pull/2034 and
see whether that is acceptable to you. :)

Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-utils-use-clone-in-run_command.patch
Type: text/x-diff
Size: 4811 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20171214/1ef4fb79/attachment-0001.patch>