[lxc-devel] [PATCH] RFC: how to fix race with fast init?

Serge Hallyn serge.hallyn at ubuntu.com
Fri Mar 8 23:01:21 UTC 2013


Detection of SIGCHLD from the container init by the monitor process
which spawned it is done during lxc_poll.  If the monitor is slow
and the init (especially if using lxc-init to run /bin/true) exits
quickly, it can send its SIGCHLD before lxc_poll starts.  In that
case lxc_poll ends up hanging forever waiting for the SIGCHLD,
while the init process is a zombie waiting to be reaped.

If you want to verify that race, you can simply add a sleep(3) right
after the lxc_set_state(... RUNNING) in lxc_spawn, then do

	sudo lxc-execute -n somecontainer -- true

I suspect a clean way to handle this is using sigaction just between
lxc_spawn and lxc_poll to set a flag if the SIGCHLD from container
init is detected.  However, I haven't yet coded that fix yet.  If
someone wants to do that before I get to it that would be great.
In the meantime, another fix which works is below.  On the one hand
it feels ugly because it's not using signals.  On the other hand,
the sigaction method will be more complicated and less obvious...

One thing I'm not sure of (and which will complicate either way)
is whether it is the case that any signals delivered between
the lxc_mainloop_add_handler() and the actual running of the
lxc_mainloop() will be queued up and handled in the lxc_mainloop().

This should fix at least 2 bugs,
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1134923
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1144873

and maybe a third
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1124526

Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
---
 src/lxc/start.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/lxc/start.c b/src/lxc/start.c
index f0e82a3..75d1fd6 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -220,6 +220,28 @@ static int setup_signal_fd(sigset_t *oldmask)
 	return fd;
 }
 
+static int is_zombie(int pid)
+{
+	char path[PATH_MAX];
+	FILE *f;
+	int ret, junkint;
+	char state;
+
+	ret = snprintf(path, PATH_MAX, "/proc/%d/stat", pid);
+	if (ret < 0 || ret >= PATH_MAX)
+		return 0;
+	f = fopen(path, "r");
+	if (!f) // if it doesn't exist, process doesn't exist...
+		return 0;
+	ret = fscanf(f, "%d %s %c", &junkint, path, &state);
+	fclose(f);
+	if (ret != 3)
+		return 0;
+	if (state == 'Z')
+		return 1;
+	return 0;
+}
+
 static int signal_handler(int fd, void *data,
 			   struct lxc_epoll_descr *descr)
 {
@@ -358,6 +380,7 @@ int lxc_poll(const char *name, struct lxc_handler *handler)
 	int sigfd = handler->sigfd;
 	int pid = handler->pid;
 	struct lxc_epoll_descr descr;
+	int ret = -1;
 
 	if (lxc_mainloop_open(&descr)) {
 		ERROR("failed to create mainloop");
@@ -390,13 +413,22 @@ int lxc_poll(const char *name, struct lxc_handler *handler)
 		#endif
 	}
 
+	ret = 0;
+	/* more robustness, protect ourself from a SIGCHLD sent
+	 * by a process different from the container init
+	 */
+	if (is_zombie(handler->pid)) {
+		INFO("init has already exited");
+		goto out_mainloop_open;
+	}
+
 	return lxc_mainloop(&descr);
 
 out_mainloop_open:
 	lxc_mainloop_close(&descr);
 out_sigfd:
 	close(sigfd);
-	return -1;
+	return ret;
 }
 
 extern int lxc_caps_check(void);
-- 
1.8.1.2





More information about the lxc-devel mailing list