[Lxc-users] On clean shutdown of Ubuntu 10.04 containers

Mon Dec 6 20:45:46 UTC 2010

On Mon, 2010-12-06 at 18:42 +1100, Trent W. Buck wrote: 
> This post describes my attempts to get "clean" shutdown of Ubuntu 10.04
> containers.  The goal here is that a "shutdown -h now" of the dom0
> should not result in a potentially inconsistent domU postgres database,
> cf. a naive lxc-stop.
> 
> As at Ubuntu 10.04 with lxc 0.7.2, lxc-start detects that a container
> has halted by 1) seeing a reboot event in <container>/var/run/utmp; or
> 2) seeing <container>'s PID 1 terminate.
> 
> Ubuntu 10.04 simply REQUIRES /var/run to be a tmpfs; this is hard-coded
> into mountall's (upstart's) /lib/init/fstab.

Are you absolutely SURE about this?  I was under the impression this was
under control of the /etc/default/rcS file and the RAMRUN option.  I set
both that and RAMLOCK to "no" and didn't think I was having any problems
with it but I'm not sure if that was specifically a 10.04 container I
was testing with.  I'll have to reverify to see if they've changed that.
That should really be consider a bug, if true.  Nothing should require
something be on tmpfs.

> Without it, the most
> immediate issue is that /var/run/ifstate isn't reaped on reboot, ifup(8)
> thinks lo (at least) is already configured, and the boot process hangs
> waiting for the network.
> 
> Unfortunately, lxc 0.7's utmp detect requires /var/run to NOT be a
> tmpfs.  The shipped lxc-ubuntu script works around this by deleting the
> ifstate file and not mounting a tmpfs on /var/run, but to me that is
> simply waiting for something else to assume /var/run is empty.  It also
> doesn't cope with a mountall upgrade rewriting /lib/init/fstab.
> 
> More or less by accident, I discovered that I can tell lxc-start that
> the container is ready to halt by "crashing" upstart:
> 
>     container# kill -SEGV 1
> 
> Likewise I can spoof a ctrl-alt-delete event in the container with:
> 
>     dom0# pkill -INT lxc-start
> 
> I automate the former signalling at the end of shutdowns thusly:
> 
>     chroot $template_dir dpkg-divert --quiet --rename /sbin/reboot
>     chroot $template_dir tee >/dev/null /sbin/reboot <<-EOF
>     	#!/bin/bash
>     	while getopts nwdfiph opt
>     	do [[ f = \$opt ]] && exec kill -SEGV 1
>     	done
>     	exec -a "$0" "\$0.distrib" "\$@"
>     	EOF
>     chroot $template_dir chmod +x /sbin/reboot
>     chroot $template_dir ln -s reboot.distrib /sbin/halt.distrib
>     chroot $template_dir ln -s reboot.distrib /sbin/poweroff.distrib
> 
> I use the latter in my customized /etc/init.d/lxc stop rule.
> Note that the lxc-wait's SHOULD be parallelized, but this is not
> possible as at lxc 0.7.2 :-(  This means that theoretically the nth
> container gets n×10min to halt, although in practice I find most
> containers go down in a decisecond or two.
> 
>     case "$1" in
>         ...
> 
>         stop)
>           log_daemon_msg "Stopping $DESC"
>           pkill -INT lxc-start
>           for name in $(lxc-ls)
>           do  if timeout 10m lxc-wait -n $name -s STOPPED
>               then
>                   log_progress_msg $name
>               else
>                   lxc-stop -n $name
>                   log_progress_msg "$name (killed)"
>               fi
>           done
>           wait
>           log_end_msg 0
>           ;;
>     esac

Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20101206/0b1c92dc/attachment.pgp>