[lxc-devel] lxc_monitord - monitor exiting
Serge Hallyn
serge.hallyn at ubuntu.com
Tue May 7 14:43:30 UTC 2013
Quoting Dwight Engen (dwight.engen at oracle.com):
> On Mon, 6 May 2013 19:08:08 -0500
> Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
>
> > Quoting Dwight Engen (dwight.engen at oracle.com):
> > > On Mon, 6 May 2013 15:31:14 -0500
> > > Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> > >
> > > > Quoting Dwight Engen (dwight.engen at oracle.com):
> > > > > On Mon, 6 May 2013 13:06:43 -0400
> > > > > Dwight Engen <dwight.engen at oracle.com> wrote:
> > > > >
> > > > > > Hi Çağlar,
> > > > > >
> > > > > > Thanks for the test program, I can sort of recreate it here
> > > > > > with that, although once I lxc-stop them all, lxc-monitord
> > > > > > does go away. I put a debug in lxc_wait() to see that the
> > > > > > client always close the fd to the monitord and they all were
> > > > > > closed so I'm not sure why lxc-monitord isn't seeing the
> > > > > > accepted fd coming back from epoll to close. Still
> > > > > > investigating...
> > > > >
> > > > > Okay, so I debugged this and the problem is basically down to
> > > > > lxc not being thread aware. With your test program we get
> > > > > multiple threads in lxcapi_start() simultaneously in the
> > > > > daemonize case. One of them forks while another one has the
> > > > > client fd to the monitor open and thus the fd gets duped by the
> > > > > fork and that is the client fd that holds lxc-monitord open
> > > > > until the container shuts down.
> > > > >
> > > > > Çağlar you could try out the following patch, it essentially
> > > > > serializes container startup from a thread perspective. I
> > > > > haven't tested it thoroughly, but it did fix the problem here.
> > > > > Right now lxc doesn't support threaded use, so you may run into
> > > > > other things as well. Depending on our stance on thread support
> > > > > in lxc, you may need to do the serialization in the threaded
> > > > > app. I guess another alternative is that initially we could
> > > > > just thread serialize at the API (big lxc lock).
> > > >
> > > > It sounds like lxcapi_start should be locking c->slock? The
> > > > priv_lock is to protect the struct lxccontainer in memory (for
> > > > when you do c = lxc_container_new(); then clone a new thread),
> > > > while the slock is to protect the container data. It's being
> > > > taken at create, info, destroy, etc, but not at start.
> > >
> > > Hi Serge, thanks for pointing those out, lxc is more thread aware
> > > than I realized :) Unfortunately I don't think either lock will
> > > help here as both these locks are per-container and the data we
> > > need to synchronize is per-calling process (ie. which fd's are open
> > > at time of fork). To put it another way, this problem happens even
> > > when each c is different.
> >
> > Right you are. I was so busy worrying about how to protect the
> > on-disk container and in-memory container_struct I wasn't thinking
> > about the shared fdtable.
> >
> > Should we have a single mutex around all fd modifications? What else?
>
> I think that will certainly work here, I haven't considered other
> places. Do we need to think about sighand and umask? Would you prefer
Yeah we should.
(And I need to decide how to fix the fact that c->slock is just called
the container name, regardless the lxcpath)
> to use the semaphores lxc already has (ie lxc_newlock() with no name)
Actually we're doing different things there, so I'm happy to use
pthread_mutex, unless it'll cause trouble for people using different
toolchains (i.e. android/bionic). The semaphores we're using are to
protect the container_struct itself, and to protect the on-disk
container data. For protecting the task data itself, pthread_mutex
seems perfect. (And no name is needed.)
> over pthread_mutex? Since Çağlar verified the test fix worked for him
> also I'd like to make a real fix.
Should I push the patch from yesterday as a first step, or would you
prefer to do all at once with one patch? I'm fine either way.
thanks,
-serge
PS - I'm in meetings all week (a very rare thing for me), so apologies
if I'm slow in responding.
More information about the lxc-devel
mailing list