[lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot

Mon Sep 17 09:26:20 UTC 2012

Hi,

> It is a bit weird to bind mount this fifo.

IMHO, a lot of low-level stuff appears to be at first glance, until you 
understand the reasoning behind it. Bind-mounting non-directories is 
actually used a lot in namepsaces. For example, if you want to keep a 
network namespace around after the last process in that namespace exits, 
you will want to bind-mount /proc/$some_pid_in_net_namespace/ns/net to 
somewhere else - then, only when the bind-mount is removed AND the last 
pid in the namespace has died is the namepsace released.

> Furthermore, I would suggest
> to prevent using a fifo it is prone to problems and could hang the
> supervisor process (aka lxc-start).

Why is it by itself prone to problems and/or hangs? The file descriptor 
is opened in nonblocking way, it is added to the epoll logic in 
mainloop. I don't see any way how that in and by itself could cause a 
hang. You should also keep in mind that the current mainloop code also 
does the user socket stuff (and since it's abstract, any user from the 
outside can connect and send at least one message before the code 
notices via SCM_RIGHTS that the credentials are wrong) and the console 
stuff (arbitrary data from inside the container is manually piped to the 
log file, if present). I don't see how one fifo / socket where the 
container's root can write anything to would make any qualitative 
difference. It's just an fd more to take care of.

> Maybe here a simple file in the rootfs let's say
> rootfs/var/run/lxc-notify would be sufficient.
> From lxc-start monitor this file and when it is created or modified 
> or
> whatever, the system running the container is booted.

How would that work if a block device is mounted as rootfs solely in 
the container namespace? Or, even if you leave the rootfs to be 
accessible from the outside, most distributions mount /run (/var/run is 
usually a symlink to /run) as a tmpfs nowadays - which is done inside 
the mount namespace of the container. You won't be able to see that from 
lxc-start. That's why I want to bind-mount something into the container, 
it's the only reliable way I can see to make sure there is a backchannel 
from the container.

> I suggest to decorrelate the states sent by lxc-start to lxc-info and 
> so
> from this notification mechanism.

I don't exactly understand what you mean by 'decorrelate' here?

>> Inside the container one may execute 'echo RUNNING > 
>> /dev/lxc-notify' or an
>> equivalent command to notify lxc-start that the container has now 
>> booted.
>> Similarly, 'echo STOPPING > /dev/lxc-notify' will change the status 
>> to
>> STOPPING, which may be done on shutdown. Currently, only RUNNING and
>> STOPPING are allowed, other states are ignored.
>
> How the process writing the "STOPPING" string can know the container 
> is
> shutting down ?

Let's say root inside a container writes 'shutdown -h now' - then the 
container is technically still running, even though init will exit 
really soon. I think that does qualify as STOPPING. If you think 
STOPPING is not completely accurate, we may introduce another status 
such as SHUTDOWN, but I think the principle applies.

That all said, in this thread quite a few people have said that they'd 
prefer a socket instead of a fifo, so if you are agreeable to the basic 
principle, the next version of my patch will use a socket instead.

Christian