[lxc-devel] [PATCH] RFC: how to fix race with fast init?

Daniel Lezcano daniel.lezcano at free.fr
Sat Mar 9 12:28:51 UTC 2013


On 03/09/2013 12:01 AM, Serge Hallyn wrote:
> Detection of SIGCHLD from the container init by the monitor process
> which spawned it is done during lxc_poll.  If the monitor is slow
> and the init (especially if using lxc-init to run /bin/true) exits
> quickly, it can send its SIGCHLD before lxc_poll starts.  In that
> case lxc_poll ends up hanging forever waiting for the SIGCHLD,
> while the init process is a zombie waiting to be reaped.

This problem has already been solved a couple of years ago. I suspect
there is another bug.

The signalfd is set *before*, so if a signal arrives while starting the
container, we should enter and exit immediately the mainloop.

That could have an edge/level triggered problem but it is not the case.

The problem is coming from the checking of the pid when it is received
by the monitor.

    lxc-execute 1362829720.335 NOTICE   lxc_execute - '/bin/true'
started with pid '18591'
    lxc-execute 1362829721.335 INFO     lxc_console - no rootfs, no console.
    lxc-execute 1362829721.335 WARN     lxc_start - invalid pid for
SIGCHLD: 18591 <> 18590

So it is ignoring the signal. This problem shouldn't appears with lxc-start.


>
> If you want to verify that race, you can simply add a sleep(3) right
> after the lxc_set_state(... RUNNING) in lxc_spawn, then do
>
> 	sudo lxc-execute -n somecontainer -- true
>
> I suspect a clean way to handle this is using sigaction just between
> lxc_spawn and lxc_poll to set a flag if the SIGCHLD from container
> init is detected.  However, I haven't yet coded that fix yet.  If
> someone wants to do that before I get to it that would be great.
> In the meantime, another fix which works is below.  On the one hand
> it feels ugly because it's not using signals.  On the other hand,
> the sigaction method will be more complicated and less obvious...
>
> One thing I'm not sure of (and which will complicate either way)
> is whether it is the case that any signals delivered between
> the lxc_mainloop_add_handler() and the actual running of the
> lxc_mainloop() will be queued up and handled in the lxc_mainloop().
>
> This should fix at least 2 bugs,
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1134923
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1144873
>
> and maybe a third
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1124526
>
> Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
> ---
>  src/lxc/start.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/src/lxc/start.c b/src/lxc/start.c
> index f0e82a3..75d1fd6 100644
> --- a/src/lxc/start.c
> +++ b/src/lxc/start.c
> @@ -220,6 +220,28 @@ static int setup_signal_fd(sigset_t *oldmask)
>  	return fd;
>  }
>  
> +static int is_zombie(int pid)
> +{
> +	char path[PATH_MAX];
> +	FILE *f;
> +	int ret, junkint;
> +	char state;
> +
> +	ret = snprintf(path, PATH_MAX, "/proc/%d/stat", pid);
> +	if (ret < 0 || ret >= PATH_MAX)
> +		return 0;
> +	f = fopen(path, "r");
> +	if (!f) // if it doesn't exist, process doesn't exist...
> +		return 0;
> +	ret = fscanf(f, "%d %s %c", &junkint, path, &state);
> +	fclose(f);
> +	if (ret != 3)
> +		return 0;
> +	if (state == 'Z')
> +		return 1;
> +	return 0;
> +}
> +
>  static int signal_handler(int fd, void *data,
>  			   struct lxc_epoll_descr *descr)
>  {
> @@ -358,6 +380,7 @@ int lxc_poll(const char *name, struct lxc_handler *handler)
>  	int sigfd = handler->sigfd;
>  	int pid = handler->pid;
>  	struct lxc_epoll_descr descr;
> +	int ret = -1;
>  
>  	if (lxc_mainloop_open(&descr)) {
>  		ERROR("failed to create mainloop");
> @@ -390,13 +413,22 @@ int lxc_poll(const char *name, struct lxc_handler *handler)
>  		#endif
>  	}
>  
> +	ret = 0;
> +	/* more robustness, protect ourself from a SIGCHLD sent
> +	 * by a process different from the container init
> +	 */
> +	if (is_zombie(handler->pid)) {
> +		INFO("init has already exited");
> +		goto out_mainloop_open;
> +	}
> +
>  	return lxc_mainloop(&descr);
>  
>  out_mainloop_open:
>  	lxc_mainloop_close(&descr);
>  out_sigfd:
>  	close(sigfd);
> -	return -1;
> +	return ret;
>  }
>  
>  extern int lxc_caps_check(void);





More information about the lxc-devel mailing list