[lxc-devel] regression: lxc-start -d hangs in lxc_monitor_sock_name (at process_lock)

Thu Sep 12 14:41:00 UTC 2013

Quoting Dwight Engen (dwight.engen at oracle.com):
> On Thu, 12 Sep 2013 00:27:04 -0400
> Stéphane Graber <stgraber at ubuntu.com> wrote:
> 
> > Hello,
> > 
> > It looks like Dwight's last change introduce a bit of a regression
> > when running lxc-start -d.
> 
> Yikes, sorry I didn't catch that in my testing. My follow on patch
> for doing the monitor socket in the abstract space gets rid of this
> entirely, so this is an additional reason to consider it.
> 
> > Tracing it down (added a ton of printf all over), it looks like it's
> > hanging on:
> >  - lxcapi_start
> >    - wait_on_daemonized_start
> >      - lxcapi_wait
> >        - lxc_wait
> >          - lxc_monitor_open
> >            - lxc_monitor_sock_name
> > 
> > Specifically, it's hanging at the process_lock() call because
> > process_lock() was already called as part of lxcapi_start and only
> > gets unlocked right after wait_on_daemonized_start returns.
> > 
> > 
> > Looking at the code, I'm not even sure why we need process_lock there.
> > What it protects is another thread triggering the mkdir_p in parallel,
> > but that shouldn't really be a problem since running two mkdir_p at
> > the same time should still result in the hierarchy being created, or
> > did I miss something?
>  
> That sounds logical to me, but hmm, does that mean we don't need it in
> lxclock_name() either (where I was modeling this on)? I wonder if
> there is a code flow that its possible for us to hang there. 

Well mkdir uses the umask right?  (and *may* use the cwd).  Both of
which are shared among threads.  It won't set them, but something else
might change them underneath them.

So I could be wrong and we might not need it, but it seemed like we
might.

-serge