[lxc-devel] lxc_monitord - monitor exiting
Dwight Engen
dwight.engen at oracle.com
Tue May 7 13:49:42 UTC 2013
On Mon, 6 May 2013 19:08:08 -0500
Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> Quoting Dwight Engen (dwight.engen at oracle.com):
> > On Mon, 6 May 2013 15:31:14 -0500
> > Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> >
> > > Quoting Dwight Engen (dwight.engen at oracle.com):
> > > > On Mon, 6 May 2013 13:06:43 -0400
> > > > Dwight Engen <dwight.engen at oracle.com> wrote:
> > > >
> > > > > Hi Çağlar,
> > > > >
> > > > > Thanks for the test program, I can sort of recreate it here
> > > > > with that, although once I lxc-stop them all, lxc-monitord
> > > > > does go away. I put a debug in lxc_wait() to see that the
> > > > > client always close the fd to the monitord and they all were
> > > > > closed so I'm not sure why lxc-monitord isn't seeing the
> > > > > accepted fd coming back from epoll to close. Still
> > > > > investigating...
> > > >
> > > > Okay, so I debugged this and the problem is basically down to
> > > > lxc not being thread aware. With your test program we get
> > > > multiple threads in lxcapi_start() simultaneously in the
> > > > daemonize case. One of them forks while another one has the
> > > > client fd to the monitor open and thus the fd gets duped by the
> > > > fork and that is the client fd that holds lxc-monitord open
> > > > until the container shuts down.
> > > >
> > > > Çağlar you could try out the following patch, it essentially
> > > > serializes container startup from a thread perspective. I
> > > > haven't tested it thoroughly, but it did fix the problem here.
> > > > Right now lxc doesn't support threaded use, so you may run into
> > > > other things as well. Depending on our stance on thread support
> > > > in lxc, you may need to do the serialization in the threaded
> > > > app. I guess another alternative is that initially we could
> > > > just thread serialize at the API (big lxc lock).
> > >
> > > It sounds like lxcapi_start should be locking c->slock? The
> > > priv_lock is to protect the struct lxccontainer in memory (for
> > > when you do c = lxc_container_new(); then clone a new thread),
> > > while the slock is to protect the container data. It's being
> > > taken at create, info, destroy, etc, but not at start.
> >
> > Hi Serge, thanks for pointing those out, lxc is more thread aware
> > than I realized :) Unfortunately I don't think either lock will
> > help here as both these locks are per-container and the data we
> > need to synchronize is per-calling process (ie. which fd's are open
> > at time of fork). To put it another way, this problem happens even
> > when each c is different.
>
> Right you are. I was so busy worrying about how to protect the
> on-disk container and in-memory container_struct I wasn't thinking
> about the shared fdtable.
>
> Should we have a single mutex around all fd modifications? What else?
I think that will certainly work here, I haven't considered other
places. Do we need to think about sighand and umask? Would you prefer
to use the semaphores lxc already has (ie lxc_newlock() with no name)
over pthread_mutex? Since Çağlar verified the test fix worked for him
also I'd like to make a real fix.
More information about the lxc-devel
mailing list