[Lxc-users] On clean shutdown of Ubuntu 10.04 containers
Trent W. Buck
twb at cybersource.com.au
Mon Dec 6 07:42:51 UTC 2010
This post describes my attempts to get "clean" shutdown of Ubuntu 10.04
containers. The goal here is that a "shutdown -h now" of the dom0
should not result in a potentially inconsistent domU postgres database,
cf. a naive lxc-stop.
As at Ubuntu 10.04 with lxc 0.7.2, lxc-start detects that a container
has halted by 1) seeing a reboot event in <container>/var/run/utmp; or
2) seeing <container>'s PID 1 terminate.
Ubuntu 10.04 simply REQUIRES /var/run to be a tmpfs; this is hard-coded
into mountall's (upstart's) /lib/init/fstab. Without it, the most
immediate issue is that /var/run/ifstate isn't reaped on reboot, ifup(8)
thinks lo (at least) is already configured, and the boot process hangs
waiting for the network.
Unfortunately, lxc 0.7's utmp detect requires /var/run to NOT be a
tmpfs. The shipped lxc-ubuntu script works around this by deleting the
ifstate file and not mounting a tmpfs on /var/run, but to me that is
simply waiting for something else to assume /var/run is empty. It also
doesn't cope with a mountall upgrade rewriting /lib/init/fstab.
More or less by accident, I discovered that I can tell lxc-start that
the container is ready to halt by "crashing" upstart:
container# kill -SEGV 1
Likewise I can spoof a ctrl-alt-delete event in the container with:
dom0# pkill -INT lxc-start
I automate the former signalling at the end of shutdowns thusly:
chroot $template_dir dpkg-divert --quiet --rename /sbin/reboot
chroot $template_dir tee >/dev/null /sbin/reboot <<-EOF
#!/bin/bash
while getopts nwdfiph opt
do [[ f = \$opt ]] && exec kill -SEGV 1
done
exec -a "$0" "\$0.distrib" "\$@"
EOF
chroot $template_dir chmod +x /sbin/reboot
chroot $template_dir ln -s reboot.distrib /sbin/halt.distrib
chroot $template_dir ln -s reboot.distrib /sbin/poweroff.distrib
I use the latter in my customized /etc/init.d/lxc stop rule.
Note that the lxc-wait's SHOULD be parallelized, but this is not
possible as at lxc 0.7.2 :-( This means that theoretically the nth
container gets n×10min to halt, although in practice I find most
containers go down in a decisecond or two.
case "$1" in
...
stop)
log_daemon_msg "Stopping $DESC"
pkill -INT lxc-start
for name in $(lxc-ls)
do if timeout 10m lxc-wait -n $name -s STOPPED
then
log_progress_msg $name
else
lxc-stop -n $name
log_progress_msg "$name (killed)"
fi
done
wait
log_end_msg 0
;;
esac
More information about the lxc-users
mailing list