<div dir="ltr">Hi Dwight,<div><br></div><div>Thank you so much for quick triage and a patch! I briefly tested it and looks like it fixed the issue here as well. IMHO the thread safety of LXC is not that bad with the staging tree (thanks to your lxc-monitord and couple of other concurrency fixes). I believe we can create/start/stop/shutdown/freeze/unfreeze (I haven't tested clone yet) different containers in parallel. There may still some issues with doing those operations to the same container from multiple threads (src/lxc/commands.c come to my mind). My short term goal/wish is to provide a good concurrent experience to the go bindings (maybe we can add one or two tests two to test suite as well).</div>
<div><br></div><div style>P.S: I'll be away for couple of days (till next tuesday) so I won't be able to test this further or try other patches about this issue. </div><div style><br></div><div style>Cheers,</div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, May 6, 2013 at 3:59 PM, Dwight Engen <span dir="ltr"><<a href="mailto:dwight.engen@oracle.com" target="_blank">dwight.engen@oracle.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Mon, 6 May 2013 13:06:43 -0400<br>
Dwight Engen <<a href="mailto:dwight.engen@oracle.com">dwight.engen@oracle.com</a>> wrote:<br>
<br>
> Hi Çağlar,<br>
><br>
> Thanks for the test program, I can sort of recreate it here with that,<br>
> although once I lxc-stop them all, lxc-monitord does go away. I put a<br>
> debug in lxc_wait() to see that the client always close the fd to the<br>
> monitord and they all were closed so I'm not sure why lxc-monitord<br>
> isn't seeing the accepted fd coming back from epoll to close. Still<br>
> investigating...<br>
<br>
</div>Okay, so I debugged this and the problem is basically down to lxc not<br>
being thread aware. With your test program we get multiple threads in<br>
lxcapi_start() simultaneously in the daemonize case. One of them forks<br>
while another one has the client fd to the monitor open and thus the fd<br>
gets duped by the fork and that is the client fd that holds lxc-monitord<br>
open until the container shuts down.<br>
<br>
Çağlar you could try out the following patch, it essentially serializes<br>
container startup from a thread perspective. I haven't tested it<br>
thoroughly, but it did fix the problem here. Right now lxc doesn't<br>
support threaded use, so you may run into other things as well.<br>
Depending on our stance on thread support in lxc, you may need to do<br>
the serialization in the threaded app. I guess another<br>
alternative is that initially we could just thread serialize at the API<br>
(big lxc lock).<br>
<br>
--<br>
<br>
diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c<br>
index 04a9208..f464fdb 100644<br>
--- a/src/lxc/lxccontainer.c<br>
+++ b/src/lxc/lxccontainer.c<br>
@@ -18,6 +18,7 @@<br>
*/<br>
<br>
#define _GNU_SOURCE<br>
+#include <pthread.h><br>
#include <unistd.h><br>
#include <sys/types.h><br>
#include <sys/wait.h><br>
@@ -346,6 +347,7 @@ static bool wait_on_daemonized_start(struct lxc_container *c)<br>
* I can't decide if it'd be more convenient for callers if we accept '...',<br>
* or a null-terminated array (i.e. execl vs execv)<br>
*/<br>
+static pthread_mutex_t start_mutex = PTHREAD_MUTEX_INITIALIZER;<br>
static bool lxcapi_start(struct lxc_container *c, int useinit, char * const argv[])<br>
{<br>
int ret;<br>
@@ -391,13 +393,24 @@ static bool lxcapi_start(struct lxc_container *c, int useinit, char * const argv<br>
if (!lxc_container_get(c))<br>
return false;<br>
lxc_monitord_spawn(c->config_path);<br>
+<br>
+ ret = pthread_mutex_lock(&start_mutex);<br>
+ if (ret != 0) {<br>
+ ERROR("pthread_mutex_lock returned:%d %s", ret, strerror(ret));<br>
+ return false;<br>
+ }<br>
pid_t pid = fork();<br>
if (pid < 0) {<br>
lxc_container_put(c);<br>
+ pthread_mutex_unlock(&start_mutex);<br>
return false;<br>
}<br>
- if (pid != 0)<br>
- return wait_on_daemonized_start(c);<br>
+ if (pid != 0) {<br>
+ ret = wait_on_daemonized_start(c);<br>
+ pthread_mutex_unlock(&start_mutex);<br>
+ return ret;<br>
+ }<br>
+ pthread_mutex_unlock(&start_mutex);<br>
/* second fork to be reparented by init */<br>
pid = fork();<br>
if (pid < 0) {<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>S.Çağlar Onur <<a href="mailto:caglar@10ur.org">caglar@10ur.org</a>>
</div></div>