[lxc-devel] regression: lxc-start -d hangs in lxc_monitor_sock_name (at process_lock)

S.Çağlar Onur caglar at 10ur.org
Thu Sep 12 21:42:39 UTC 2013


Hi,

I think staging (my head is @ 813a48...) started to stuck while creating
containers concurrently after monitoring related changes.

I observed that issue with the Go bindings first. Then I wrote a test case
to remove Go from the picture and I also thought that having a test case
would be helpful (see "[PATCH] tests: Introduce lxc-test-concurrent for
testing basic actions concurrently").

Normally one should see following

[caglar at qgq:~/Projects/lxc(staging)] sudo lxc-test-concurrent

Executing (create) for 5 containers...

Executing (start) for 5 containers...

Executing (stop) for 5 containers...

Executing (destroy) for 5 containers...


but occasionally create started to stuck on my test system (just try to run
couple of times).

Cheers,



On Thu, Sep 12, 2013 at 10:41 AM, Serge Hallyn <serge.hallyn at ubuntu.com>wrote:

> Quoting Dwight Engen (dwight.engen at oracle.com):
> > On Thu, 12 Sep 2013 00:27:04 -0400
> > Stéphane Graber <stgraber at ubuntu.com> wrote:
> >
> > > Hello,
> > >
> > > It looks like Dwight's last change introduce a bit of a regression
> > > when running lxc-start -d.
> >
> > Yikes, sorry I didn't catch that in my testing. My follow on patch
> > for doing the monitor socket in the abstract space gets rid of this
> > entirely, so this is an additional reason to consider it.
> >
> > > Tracing it down (added a ton of printf all over), it looks like it's
> > > hanging on:
> > >  - lxcapi_start
> > >    - wait_on_daemonized_start
> > >      - lxcapi_wait
> > >        - lxc_wait
> > >          - lxc_monitor_open
> > >            - lxc_monitor_sock_name
> > >
> > > Specifically, it's hanging at the process_lock() call because
> > > process_lock() was already called as part of lxcapi_start and only
> > > gets unlocked right after wait_on_daemonized_start returns.
> > >
> > >
> > > Looking at the code, I'm not even sure why we need process_lock there.
> > > What it protects is another thread triggering the mkdir_p in parallel,
> > > but that shouldn't really be a problem since running two mkdir_p at
> > > the same time should still result in the hierarchy being created, or
> > > did I miss something?
> >
> > That sounds logical to me, but hmm, does that mean we don't need it in
> > lxclock_name() either (where I was modeling this on)? I wonder if
> > there is a code flow that its possible for us to hang there.
>
> Well mkdir uses the umask right?  (and *may* use the cwd).  Both of
> which are shared among threads.  It won't set them, but something else
> might change them underneath them.
>
> So I could be wrong and we might not need it, but it seemed like we
> might.
>
> -serge
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. Consolidate legacy IT systems to a single system of record for IT
> 2. Standardize and globalize service processes across IT
> 3. Implement zero-touch automation to replace manual, redundant tasks
> http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
> _______________________________________________
> Lxc-devel mailing list
> Lxc-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-devel
>



-- 
S.Çağlar Onur <caglar at 10ur.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130912/5d8f904c/attachment.html>


More information about the lxc-devel mailing list