[lxc-devel] regression: lxc-start -d hangs in lxc_monitor_sock_name (at process_lock)

Serge Hallyn serge.hallyn at ubuntu.com
Thu Sep 12 21:55:31 UTC 2013


Thanks.  A few days ago I wrote a short-n-simple little program that
cloned two thread which each did some things with containers.  It was
definately racy.

Based on your input I"ll take a closer look at the new monitoring
code.

I'm hoping to take a much closer look next week.  I.e. load two
containers, and fork threads just to do c->is_defined; just c->wait;
just c->daemonize;  just c->start(); etc.  and see which ones are
racy after a few runs.

Quoting S.Çağlar Onur (caglar at 10ur.org):
> Hi,
> 
> I think staging (my head is @ 813a48...) started to stuck while creating
> containers concurrently after monitoring related changes.
> 
> I observed that issue with the Go bindings first. Then I wrote a test case
> to remove Go from the picture and I also thought that having a test case
> would be helpful (see "[PATCH] tests: Introduce lxc-test-concurrent for
> testing basic actions concurrently").
> 
> Normally one should see following
> 
> [caglar at qgq:~/Projects/lxc(staging)] sudo lxc-test-concurrent
> 
> Executing (create) for 5 containers...
> 
> Executing (start) for 5 containers...
> 
> Executing (stop) for 5 containers...
> 
> Executing (destroy) for 5 containers...
> 
> 
> but occasionally create started to stuck on my test system (just try to run
> couple of times).
> 
> Cheers,
> 
> 
> 
> On Thu, Sep 12, 2013 at 10:41 AM, Serge Hallyn <serge.hallyn at ubuntu.com>wrote:
> 
> > Quoting Dwight Engen (dwight.engen at oracle.com):
> > > On Thu, 12 Sep 2013 00:27:04 -0400
> > > Stéphane Graber <stgraber at ubuntu.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > It looks like Dwight's last change introduce a bit of a regression
> > > > when running lxc-start -d.
> > >
> > > Yikes, sorry I didn't catch that in my testing. My follow on patch
> > > for doing the monitor socket in the abstract space gets rid of this
> > > entirely, so this is an additional reason to consider it.
> > >
> > > > Tracing it down (added a ton of printf all over), it looks like it's
> > > > hanging on:
> > > >  - lxcapi_start
> > > >    - wait_on_daemonized_start
> > > >      - lxcapi_wait
> > > >        - lxc_wait
> > > >          - lxc_monitor_open
> > > >            - lxc_monitor_sock_name
> > > >
> > > > Specifically, it's hanging at the process_lock() call because
> > > > process_lock() was already called as part of lxcapi_start and only
> > > > gets unlocked right after wait_on_daemonized_start returns.
> > > >
> > > >
> > > > Looking at the code, I'm not even sure why we need process_lock there.
> > > > What it protects is another thread triggering the mkdir_p in parallel,
> > > > but that shouldn't really be a problem since running two mkdir_p at
> > > > the same time should still result in the hierarchy being created, or
> > > > did I miss something?
> > >
> > > That sounds logical to me, but hmm, does that mean we don't need it in
> > > lxclock_name() either (where I was modeling this on)? I wonder if
> > > there is a code flow that its possible for us to hang there.
> >
> > Well mkdir uses the umask right?  (and *may* use the cwd).  Both of
> > which are shared among threads.  It won't set them, but something else
> > might change them underneath them.
> >
> > So I could be wrong and we might not need it, but it seemed like we
> > might.
> >
> > -serge
> >
> >
> > ------------------------------------------------------------------------------
> > How ServiceNow helps IT people transform IT departments:
> > 1. Consolidate legacy IT systems to a single system of record for IT
> > 2. Standardize and globalize service processes across IT
> > 3. Implement zero-touch automation to replace manual, redundant tasks
> > http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Lxc-devel mailing list
> > Lxc-devel at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/lxc-devel
> >
> 
> 
> 
> -- 
> S.Çağlar Onur <caglar at 10ur.org>




More information about the lxc-devel mailing list