[lxc-devel] [PATCH] Add mechanism for container to notify host about end of boot
Christian Seiler
christian at iwakd.de
Mon Sep 17 09:26:20 UTC 2012
Hi,
> It is a bit weird to bind mount this fifo.
IMHO, a lot of low-level stuff appears to be at first glance, until you
understand the reasoning behind it. Bind-mounting non-directories is
actually used a lot in namepsaces. For example, if you want to keep a
network namespace around after the last process in that namespace exits,
you will want to bind-mount /proc/$some_pid_in_net_namespace/ns/net to
somewhere else - then, only when the bind-mount is removed AND the last
pid in the namespace has died is the namepsace released.
> Furthermore, I would suggest
> to prevent using a fifo it is prone to problems and could hang the
> supervisor process (aka lxc-start).
Why is it by itself prone to problems and/or hangs? The file descriptor
is opened in nonblocking way, it is added to the epoll logic in
mainloop. I don't see any way how that in and by itself could cause a
hang. You should also keep in mind that the current mainloop code also
does the user socket stuff (and since it's abstract, any user from the
outside can connect and send at least one message before the code
notices via SCM_RIGHTS that the credentials are wrong) and the console
stuff (arbitrary data from inside the container is manually piped to the
log file, if present). I don't see how one fifo / socket where the
container's root can write anything to would make any qualitative
difference. It's just an fd more to take care of.
> Maybe here a simple file in the rootfs let's say
> rootfs/var/run/lxc-notify would be sufficient.
> From lxc-start monitor this file and when it is created or modified
> or
> whatever, the system running the container is booted.
How would that work if a block device is mounted as rootfs solely in
the container namespace? Or, even if you leave the rootfs to be
accessible from the outside, most distributions mount /run (/var/run is
usually a symlink to /run) as a tmpfs nowadays - which is done inside
the mount namespace of the container. You won't be able to see that from
lxc-start. That's why I want to bind-mount something into the container,
it's the only reliable way I can see to make sure there is a backchannel
from the container.
> I suggest to decorrelate the states sent by lxc-start to lxc-info and
> so
> from this notification mechanism.
I don't exactly understand what you mean by 'decorrelate' here?
>> Inside the container one may execute 'echo RUNNING >
>> /dev/lxc-notify' or an
>> equivalent command to notify lxc-start that the container has now
>> booted.
>> Similarly, 'echo STOPPING > /dev/lxc-notify' will change the status
>> to
>> STOPPING, which may be done on shutdown. Currently, only RUNNING and
>> STOPPING are allowed, other states are ignored.
>
> How the process writing the "STOPPING" string can know the container
> is
> shutting down ?
Let's say root inside a container writes 'shutdown -h now' - then the
container is technically still running, even though init will exit
really soon. I think that does qualify as STOPPING. If you think
STOPPING is not completely accurate, we may introduce another status
such as SHUTDOWN, but I think the principle applies.
That all said, in this thread quite a few people have said that they'd
prefer a socket instead of a fifo, so if you are agreeable to the basic
principle, the next version of my patch will use a socket instead.
Christian
More information about the lxc-devel
mailing list