[lxc-devel] [PATCH] RFC: how to fix race with fast init?
Daniel Lezcano
daniel.lezcano at free.fr
Sat Mar 9 12:28:51 UTC 2013
On 03/09/2013 12:01 AM, Serge Hallyn wrote:
> Detection of SIGCHLD from the container init by the monitor process
> which spawned it is done during lxc_poll. If the monitor is slow
> and the init (especially if using lxc-init to run /bin/true) exits
> quickly, it can send its SIGCHLD before lxc_poll starts. In that
> case lxc_poll ends up hanging forever waiting for the SIGCHLD,
> while the init process is a zombie waiting to be reaped.
This problem has already been solved a couple of years ago. I suspect
there is another bug.
The signalfd is set *before*, so if a signal arrives while starting the
container, we should enter and exit immediately the mainloop.
That could have an edge/level triggered problem but it is not the case.
The problem is coming from the checking of the pid when it is received
by the monitor.
lxc-execute 1362829720.335 NOTICE lxc_execute - '/bin/true'
started with pid '18591'
lxc-execute 1362829721.335 INFO lxc_console - no rootfs, no console.
lxc-execute 1362829721.335 WARN lxc_start - invalid pid for
SIGCHLD: 18591 <> 18590
So it is ignoring the signal. This problem shouldn't appears with lxc-start.
>
> If you want to verify that race, you can simply add a sleep(3) right
> after the lxc_set_state(... RUNNING) in lxc_spawn, then do
>
> sudo lxc-execute -n somecontainer -- true
>
> I suspect a clean way to handle this is using sigaction just between
> lxc_spawn and lxc_poll to set a flag if the SIGCHLD from container
> init is detected. However, I haven't yet coded that fix yet. If
> someone wants to do that before I get to it that would be great.
> In the meantime, another fix which works is below. On the one hand
> it feels ugly because it's not using signals. On the other hand,
> the sigaction method will be more complicated and less obvious...
>
> One thing I'm not sure of (and which will complicate either way)
> is whether it is the case that any signals delivered between
> the lxc_mainloop_add_handler() and the actual running of the
> lxc_mainloop() will be queued up and handled in the lxc_mainloop().
>
> This should fix at least 2 bugs,
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1134923
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1144873
>
> and maybe a third
> https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1124526
>
> Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
> ---
> src/lxc/start.c | 34 +++++++++++++++++++++++++++++++++-
> 1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/src/lxc/start.c b/src/lxc/start.c
> index f0e82a3..75d1fd6 100644
> --- a/src/lxc/start.c
> +++ b/src/lxc/start.c
> @@ -220,6 +220,28 @@ static int setup_signal_fd(sigset_t *oldmask)
> return fd;
> }
>
> +static int is_zombie(int pid)
> +{
> + char path[PATH_MAX];
> + FILE *f;
> + int ret, junkint;
> + char state;
> +
> + ret = snprintf(path, PATH_MAX, "/proc/%d/stat", pid);
> + if (ret < 0 || ret >= PATH_MAX)
> + return 0;
> + f = fopen(path, "r");
> + if (!f) // if it doesn't exist, process doesn't exist...
> + return 0;
> + ret = fscanf(f, "%d %s %c", &junkint, path, &state);
> + fclose(f);
> + if (ret != 3)
> + return 0;
> + if (state == 'Z')
> + return 1;
> + return 0;
> +}
> +
> static int signal_handler(int fd, void *data,
> struct lxc_epoll_descr *descr)
> {
> @@ -358,6 +380,7 @@ int lxc_poll(const char *name, struct lxc_handler *handler)
> int sigfd = handler->sigfd;
> int pid = handler->pid;
> struct lxc_epoll_descr descr;
> + int ret = -1;
>
> if (lxc_mainloop_open(&descr)) {
> ERROR("failed to create mainloop");
> @@ -390,13 +413,22 @@ int lxc_poll(const char *name, struct lxc_handler *handler)
> #endif
> }
>
> + ret = 0;
> + /* more robustness, protect ourself from a SIGCHLD sent
> + * by a process different from the container init
> + */
> + if (is_zombie(handler->pid)) {
> + INFO("init has already exited");
> + goto out_mainloop_open;
> + }
> +
> return lxc_mainloop(&descr);
>
> out_mainloop_open:
> lxc_mainloop_close(&descr);
> out_sigfd:
> close(sigfd);
> - return -1;
> + return ret;
> }
>
> extern int lxc_caps_check(void);
More information about the lxc-devel
mailing list