[Lxc-users] On clean shutdown of Ubuntu 10.04 containers

Mon Dec 6 20:34:24 UTC 2010

On Mon, 2010-12-06 at 12:38 -0500, Brian K. White wrote: 
> On 12/6/2010 2:42 AM, Trent W. Buck wrote:
> > This post describes my attempts to get "clean" shutdown of Ubuntu 10.04
> > containers.  The goal here is that a "shutdown -h now" of the dom0
> > should not result in a potentially inconsistent domU postgres database,
> > cf. a naive lxc-stop.
> >
> > As at Ubuntu 10.04 with lxc 0.7.2, lxc-start detects that a container
> > has halted by 1) seeing a reboot event in<container>/var/run/utmp; or
> > 2) seeing<container>'s PID 1 terminate.
> >
> > Ubuntu 10.04 simply REQUIRES /var/run to be a tmpfs; this is hard-coded
> > into mountall's (upstart's) /lib/init/fstab.  Without it, the most
> > immediate issue is that /var/run/ifstate isn't reaped on reboot, ifup(8)
> > thinks lo (at least) is already configured, and the boot process hangs
> > waiting for the network.
> >
> > Unfortunately, lxc 0.7's utmp detect requires /var/run to NOT be a
> > tmpfs.  The shipped lxc-ubuntu script works around this by deleting the
> > ifstate file and not mounting a tmpfs on /var/run, but to me that is
> > simply waiting for something else to assume /var/run is empty.  It also
> > doesn't cope with a mountall upgrade rewriting /lib/init/fstab.
> >
> > More or less by accident, I discovered that I can tell lxc-start that
> > the container is ready to halt by "crashing" upstart:
> >
> >      container# kill -SEGV 1
> >
> > Likewise I can spoof a ctrl-alt-delete event in the container with:
> >
> >      dom0# pkill -INT lxc-start
> >
> > I automate the former signalling at the end of shutdowns thusly:
> >
> >      chroot $template_dir dpkg-divert --quiet --rename /sbin/reboot
> >      chroot $template_dir tee>/dev/null /sbin/reboot<<-EOF
> >      	#!/bin/bash
> >      	while getopts nwdfiph opt
> >      	do [[ f = \$opt ]]&&  exec kill -SEGV 1
> >      	done
> >      	exec -a "$0" "\$0.distrib" "\$@"
> >      	EOF
> >      chroot $template_dir chmod +x /sbin/reboot
> >      chroot $template_dir ln -s reboot.distrib /sbin/halt.distrib
> >      chroot $template_dir ln -s reboot.distrib /sbin/poweroff.distrib
> >
> > I use the latter in my customized /etc/init.d/lxc stop rule.
> > Note that the lxc-wait's SHOULD be parallelized, but this is not
> > possible as at lxc 0.7.2 :-(
> 
> Sure it is.
> I parallelize the shutdowns (in any version, including 0.7.2) by doing 
> all the lxc-stop in parallel without looking or waiting, then in a 
> separate following step do a loop that waits for no containers running.

> Here is my openSUSE init.d/lxc:
> https://build.opensuse.org/package/files?package=lxc&project=home:aljex
> And the packages:
> http://download.opensuse.org/repositories/home:/aljex/*/lxc-0.7.2*.rpm

> It makes assumptions that are wrong for ubuntu and is more limited than 
> you may want in terms of what it even tries to handle. But that's beside 
> the point of parallel shutdowns.

Also wrong for Fedora.

> * cgroup handling includes a particular stack of override logic for 
> possible cgroup mount points that makes sense to me.
> - start with built-in default /var/run/lxc/cgroup, and name it "lxc" so 
> as not to conflict with any other cgroup setup by default.
> - if you defined something in $LXC_CONF, prefer it over default
> - if kernel is providing /sys/fs/cgroup automatically, prefer that over 
> either default or $LXC_CONF
> - if a cgroup named "lxc" is already mounted, prefer that over all else

I'm not quite sure if I would put those last two in that order.
Especially after the last little discussion over on LKML over the per
tty cgroups in the kernel vs in user space, I think I would let the
kernel defined /sys/fs/cgroup trump all else if it exists.  Something
that's been mounted may not have been mounted with all the options you
may want, but I'm not sure how much difference that's going to make.  I
would think the kernel definition would be preferable.  Is there
something specific you had in mind that would lead you to want to
override that?

> * assumes lxc 0.7.2 because the script is part of a lxc-0.7.2 rpm
> - removes the shutdown/reboot watchdog functions that were needed in 
> 0.6.5 but are built in to 0.7.2 now.

> * only starts containers that are defined by $LXC_ETC/*/config

Yeah, that's something where I wish we had an "onboot" and/or "disabled"
config file like OpenVZ does.  So you can have some configured but that
don't autoboot when you boot the system.  As that stands, you would have
to rename or remove the config file.  :-P

> * only shuts down containers that it started

I don't quite see that as happening literally as described.  Looks like
it's going to shut down any container for which it can find a powerfail
init, even if it was started by some other means, say manually.  It
doesn't seem to be actually tracking what ones it started.  Granted,
during normal operation, you're going to try to start everything with a
config but it looks like it will shut down manually started containers
as well, even if they are not listed with configs and would not even
show up in your status commands.  I don't even see a check in that
routine for the config file.

> * the stop function greps for /sbin/init in container inittab instead of 
> trying to allow for any random container pid #1

Yeah, that little trick won't work with Ubuntu or Fedora using upstart.

That gives you the peculiar situation here that if you have a modern
Fedora or Ubuntu container, your script even on an OpenSUSE system would
start them but could not stop them while it would not start a container
without a config but would stop a running SUSE one regardless of a
config.

> * no provision for application/service containers, just whole systems 
> started with /sbin/init

That only makes sense.

> * starts containers in screen
> - I have not figured out what it would take to get nice behavior out of 
> lxc-console yet and screen is both easy and standard.

I usually start them with logging enabled and redirected with the -o
option.  To each his own.

> The $LXC_CONF (/etc/lxc/lxc.conf) referenced at the top does not exist 
> usually so everything that happens is visible right in the script.

That's a real good practice.

> I'm using this in production. So far so good.

> typical usage:
> 
> nj10:~ # rclxc status

Of what is this rclxc command of which you speaketh?  Oh, I see from the
script that rclxc is a symlink to /etc/init.d/lxc.  But what's the rc_*
things in there?  Something you sourced out of rc.status?  That's
something SUSE?

Nice thoughts in there.  Could be adaptable.  :-)=)

> Checking for LXC containers... 
>                                                   running

: - Snip

Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20101206/66420af1/attachment.pgp>