[lxc-devel] lxc_monitord - monitor exiting

Fri May 10 19:05:25 UTC 2013

On Tue, 7 May 2013 09:43:30 -0500
Serge Hallyn <serge.hallyn at ubuntu.com> wrote:

> Quoting Dwight Engen (dwight.engen at oracle.com):
> > On Mon, 6 May 2013 19:08:08 -0500
> > Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> > 
> > > Quoting Dwight Engen (dwight.engen at oracle.com):
> > > > On Mon, 6 May 2013 15:31:14 -0500
> > > > Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> > > > 
> > > > > Quoting Dwight Engen (dwight.engen at oracle.com):
> > > > > > On Mon, 6 May 2013 13:06:43 -0400
> > > > > > Dwight Engen <dwight.engen at oracle.com> wrote:
> > > > > > 
> > > > > > > Hi Çağlar,
> > > > > > > 
> > > > > > > Thanks for the test program, I can sort of recreate it
> > > > > > > here with that, although once I lxc-stop them all,
> > > > > > > lxc-monitord does go away. I put a debug in lxc_wait() to
> > > > > > > see that the client always close the fd to the monitord
> > > > > > > and they all were closed so I'm not sure why lxc-monitord
> > > > > > > isn't seeing the accepted fd coming back from epoll to
> > > > > > > close. Still investigating...
> > > > > > 
> > > > > > Okay, so I debugged this and the problem is basically down
> > > > > > to lxc not being thread aware. With your test program we get
> > > > > > multiple threads in lxcapi_start() simultaneously in the
> > > > > > daemonize case. One of them forks while another one has the
> > > > > > client fd to the monitor open and thus the fd gets duped by
> > > > > > the fork and that is the client fd that holds lxc-monitord
> > > > > > open until the container shuts down.
> > > > > > 
> > > > > > Çağlar you could try out the following patch, it essentially
> > > > > > serializes container startup from a thread perspective. I
> > > > > > haven't tested it thoroughly, but it did fix the problem
> > > > > > here. Right now lxc doesn't support threaded use, so you
> > > > > > may run into other things as well. Depending on our stance
> > > > > > on thread support in lxc, you may need to do the
> > > > > > serialization in the threaded app. I guess another
> > > > > > alternative is that initially we could just thread
> > > > > > serialize at the API (big lxc lock).
> > > > > 
> > > > > It sounds like lxcapi_start should be locking c->slock?  The
> > > > > priv_lock is to protect the struct lxccontainer in memory (for
> > > > > when you do c = lxc_container_new(); then clone a new thread),
> > > > > while the slock is to protect the container data.  It's being
> > > > > taken at create, info, destroy, etc, but not at start.
> > > > 
> > > > Hi Serge, thanks for pointing those out, lxc is more thread
> > > > aware than I realized :) Unfortunately I don't think either
> > > > lock will help here as both these locks are per-container and
> > > > the data we need to synchronize is per-calling process (ie.
> > > > which fd's are open at time of fork). To put it another way,
> > > > this problem happens even when each c is different.
> > > 
> > > Right you are.  I was so busy worrying about how to protect the
> > > on-disk container and in-memory container_struct I wasn't thinking
> > > about the shared fdtable.
> > > 
> > > Should we have a single mutex around all fd modifications?  What
> > > else?
> > 
> > I think that will certainly work here, I haven't considered other
> > places. Do we need to think about sighand and umask? Would you
> > prefer
> 
> Yeah we should.
> 
> (And I need to decide how to fix the fact that c->slock is just called
> the container name, regardless the lxcpath)
> 
> > to use the semaphores lxc already has (ie lxc_newlock() with no
> > name)
> 
> Actually we're doing different things there, so I'm happy to use
> pthread_mutex, unless it'll cause trouble for people using different
> toolchains (i.e. android/bionic).  The semaphores we're using are to
> protect the container_struct itself, and to protect the on-disk
> container data.  For protecting the task data itself, pthread_mutex
> seems perfect.  (And no name is needed.)
> 
> > over pthread_mutex? Since Çağlar verified the test fix worked for
> > him also I'd like to make a real fix.
> 
> Should I push the patch from yesterday as a first step, or would you
> prefer to do all at once with one patch?  I'm fine either way.

Hi Serge, after testing this a bit more and the fact that you're fine
using pthread_mutex, I think its okay as is since it fixes the immediate
problem. If/When we find more process things that need thread locking
we can rename the lock and make it more generic. If you want me to submit
a regular git patch with sign off and all just let me know.

> thanks,
> -serge
> 
> PS - I'm in meetings all week (a very rare thing for me), so apologies
> if I'm slow in responding.