[lxc-devel] [PATCH] RFC: how to fix race with fast init?

Serge Hallyn serge.hallyn at ubuntu.com
Sun Mar 10 02:48:41 UTC 2013


Quoting Daniel Lezcano (daniel.lezcano at free.fr):
> On 03/09/2013 12:01 AM, Serge Hallyn wrote:
> > Detection of SIGCHLD from the container init by the monitor process
> > which spawned it is done during lxc_poll.  If the monitor is slow
> > and the init (especially if using lxc-init to run /bin/true) exits
> > quickly, it can send its SIGCHLD before lxc_poll starts.  In that
> > case lxc_poll ends up hanging forever waiting for the SIGCHLD,
> > while the init process is a zombie waiting to be reaped.
> 
> This problem has already been solved a couple of years ago. I suspect
> there is another bug.

That could certainly be (in which case my patch would be the only
workaround for now).  I did see the early sigfd init, but figured
that the sigchlds must not be getting queued?  You do epoll_wait
much later, but you're not doing any sort of inotify thing.

Even when I move the initialization of the signalfd epoll earlier (but
not the epoll_wait itself, of course) I still miss the sigchld.

> The signalfd is set *before*, so if a signal arrives while starting the
> container, we should enter and exit immediately the mainloop.
> 
> That could have an edge/level triggered problem but it is not the case.
> 
> The problem is coming from the checking of the pid when it is received
> by the monitor.
> 
>     lxc-execute 1362829720.335 NOTICE   lxc_execute - '/bin/true'
> started with pid '18591'
>     lxc-execute 1362829721.335 INFO     lxc_console - no rootfs, no console.
>     lxc-execute 1362829721.335 WARN     lxc_start - invalid pid for
> SIGCHLD: 18591 <> 18590
> 
> So it is ignoring the signal.

But 18590 is presumably lxc-init, which becomes defunct, while 18591 is
the forked task to actualy do the work.  So why do we miss lxc-init's
signal?

> This problem shouldn't appears with lxc-start.

It hasn't (that we know of), but then the init task in lxc-start takes
a lot longer.  I haven't ried, what happens when you

	lxc-start -n r1 -- lxc-init /bin/true

(my toy box isn't on right now, can't test tonight)

thanks,
-serge




More information about the lxc-devel mailing list