[lxc-devel] Shutting down containers properly

Fri May 25 15:00:38 UTC 2012

On 05/25/2012 07:56 AM, Christian Seiler wrote:
> Hi,
> 
> Currently, lxc-stop sends SIGKILL to the init process of the container,
> which causes all the other processes in the container to also receive
> a SIGKILL. I don't think that is a good course of action, since sending
> SIGKILL to for example a database server can lead to potential data
> loss.
> 
> A much better way of stopping containers would be in my opinion to
> first send the container a shutdown signal - and then wait for a
> specified amount of time before really killing the container with a
> KILL signal.
> 
> Unfortunately, no init system will react to SIGTERM and shut down the
> container, so it is not quite as easy. I've looked a bit at different
> init systems to see how to properly shut them down:
> 
>   - lxc application containers (lxc-execute): lxc-init will do a
>     kill(-1, SIGTERM) if it receives a SIGTERM itself, so sending
>     it a SIGTERM is sufficient to initiate a proper shutdown
> 
>   - sysvinit: open /run/initctl (newer Debian) or /dev/initctl (older
>     Debian and other distros) and send them a binary message to switch
>     to runlevel 0
> 
>   - upstart: connect to DBus and tell it to switch to runlevel 0
> 
>   - systemd: either connect to DBus and tell it to switch runlevel or
>     send SIGRTMIN + 4, that will also cause a shutdown
> 
>   - sysvinit + upstart + systemd also all provide a 'telinit' binary,
>     where calling 'telinit 0' will initiate a shutdown
> 
> My proposal would be the following:
> 
> lxc-stop first sends a new SHUTDOWN command (instead of the current
> STOP command), which initiates the shutdown and returns immediately.
> The command handler in lxc-start will then initiate a shutdown of the
> container (see below). lxc-stop will wait for a given amount of seconds
> and if the container is not stopped by then, it will send the current
> STOP command to actually kill the container with SIGKILL.
> 
> On the other hand, add a --force option that will make lxc-stop still
> be able to kill all processes immediately.
> 
> Now how to shut down the container? In lxc.conf there should be a new
> configuration option, lxc.shutdown_method, which can carry the
> following values: "application", "sysvinit", "systemd" and "exec".
> For application containers started with lxc-execute, it will default
> to "application", for system containers started with lxc-start, it will
> default to "sysvinit".
> 
> The following actions will be performed:
> 
> "application": send SIGTERM to init process of the container
> 
> "sysvinit": fork(), child process does setns() for mount namespace,
>              tries to send signal to /run/initctl and /dev/initctl
>              (whichever exists), but first checks whether st_dev and
>              st_ino entries do NOT match those of the host's files,
>              so we don't accidentally shut down the host (if the
>              container hares filesystem with the host)
> 
> "systemd": send SIGRTMIN + 4 to init process of the container
> 
> "exec": run lxc-attach for the container with the contents of the
>          new option lxc.shutdown_command as parameter
> 
> I haven't included any explicit method for shutting down upstart, so
> containers running upstart inside (assuming that's even possible, I
> don't know much about upstart) should probably use method exec and
> execute telinit 0 inside the container. Sending simple signals to the
> init process as in application / systemd or opening a FIFO and writing
> some bytes for sysvinit is still quite trivial, but implementing DBus
> (esp. across container boundaries) - which would be required for native
> upstart shutdown support - seems like overkill to me.
> 
> On the other hand, I do want to explicitly implement the sysvinit way,
> since there we can check that we're definitely not going to shut down
> the host accidentally (by checking the device/inode numbers of the
> initctl FIFOs), which we can't be 100% sure of with exec.
> 
> Caveats:
> 
> 1. application / systemd methods should always work, since we just send
> a signal to the init process; sysvinit will only work if attaching
> mount namespaces is implemented in the kernel and exec only if full
> lxc-attach works (so all namespaces). But the worst case scenario here
> is that we still kill all processes in the container with lxc-stop if
> the kernel doesn't support attach, so there is no loss for current
> users.
> 
> 2. If the container is frozen, the current logic first sends the KILL
> signal and then unfreezes it, so the container immediately goes away.
> However, how should we react if we just want to shut it down? Unfreeze
> it and send the shutdown signal? Or just kill it immediately? Or do
> nothing and report an error?
> 
> Thoughts?
> 
> (Note: I'd be willing to implement this feature, once a consensus is
> reached on how to proceeed.)
> 
> Regards,
> Christian

I only very quickly read through your e-mail, apologize if what I'm
saying below is already covered in your e-mail.

Have you looked at the lxc-shutdown script we have in Ubuntu and the
integration we have with upstart?

lxc-shutdown sends two different signals:
 reboot => SIGINT
 shutdown => SIGPWR

These are caught by upstart and will trigger a clean reboot or shutdown
of the container. It's what happens on shutdown of the host in 12.04 LTS.

lxc-shutdown also allows for a timeout to be set that when hit will
forcibly kill the container.

Apparently lxc-shutdown didn't make it to the upstream git branch yet
but I poked Serge about it on IRC so it should be pushed soon.

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 900 bytes
Desc: OpenPGP digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20120525/1dd43d0e/attachment.pgp>