[Lxc-users] On clean shutdown of Ubuntu 10.04 containers

Mon Dec 6 23:03:29 UTC 2010

On Mon, 2010-12-06 at 17:38 -0500, Brian K. White wrote: 
> On 12/6/2010 3:34 PM, Michael H. Warfield wrote:
> > On Mon, 2010-12-06 at 12:38 -0500, Brian K. White wrote:
> >> On 12/6/2010 2:42 AM, Trent W. Buck wrote:
> >>> This post describes my attempts to get "clean" shutdown of Ubuntu 10.04
> >>> containers.  The goal here is that a "shutdown -h now" of the dom0
> >>> should not result in a potentially inconsistent domU postgres database,
> >>> cf. a naive lxc-stop.
> >>>
> >>> As at Ubuntu 10.04 with lxc 0.7.2, lxc-start detects that a container
> >>> has halted by 1) seeing a reboot event in<container>/var/run/utmp; or
> >>> 2) seeing<container>'s PID 1 terminate.
> >>>
> >>> Ubuntu 10.04 simply REQUIRES /var/run to be a tmpfs; this is hard-coded
> >>> into mountall's (upstart's) /lib/init/fstab.  Without it, the most
> >>> immediate issue is that /var/run/ifstate isn't reaped on reboot, ifup(8)
> >>> thinks lo (at least) is already configured, and the boot process hangs
> >>> waiting for the network.
> >>>
> >>> Unfortunately, lxc 0.7's utmp detect requires /var/run to NOT be a
> >>> tmpfs.  The shipped lxc-ubuntu script works around this by deleting the
> >>> ifstate file and not mounting a tmpfs on /var/run, but to me that is
> >>> simply waiting for something else to assume /var/run is empty.  It also
> >>> doesn't cope with a mountall upgrade rewriting /lib/init/fstab.
> >>>
> >>> More or less by accident, I discovered that I can tell lxc-start that
> >>> the container is ready to halt by "crashing" upstart:
> >>>
> >>>       container# kill -SEGV 1
> >>>
> >>> Likewise I can spoof a ctrl-alt-delete event in the container with:
> >>>
> >>>       dom0# pkill -INT lxc-start
> >>>
> >>> I automate the former signalling at the end of shutdowns thusly:
> >>>
> >>>       chroot $template_dir dpkg-divert --quiet --rename /sbin/reboot
> >>>       chroot $template_dir tee>/dev/null /sbin/reboot<<-EOF
> >>>       	#!/bin/bash
> >>>       	while getopts nwdfiph opt
> >>>       	do [[ f = \$opt ]]&&   exec kill -SEGV 1
> >>>       	done
> >>>       	exec -a "$0" "\$0.distrib" "\$@"
> >>>       	EOF
> >>>       chroot $template_dir chmod +x /sbin/reboot
> >>>       chroot $template_dir ln -s reboot.distrib /sbin/halt.distrib
> >>>       chroot $template_dir ln -s reboot.distrib /sbin/poweroff.distrib
> >>>
> >>> I use the latter in my customized /etc/init.d/lxc stop rule.
> >>> Note that the lxc-wait's SHOULD be parallelized, but this is not
> >>> possible as at lxc 0.7.2 :-(
> >>
> >> Sure it is.
> >> I parallelize the shutdowns (in any version, including 0.7.2) by doing
> >> all the lxc-stop in parallel without looking or waiting, then in a
> >> separate following step do a loop that waits for no containers running.
> >
> >> Here is my openSUSE init.d/lxc:
> >> https://build.opensuse.org/package/files?package=lxc&project=home:aljex
> >> And the packages:
> >> http://download.opensuse.org/repositories/home:/aljex/*/lxc-0.7.2*.rpm
> >
> >> It makes assumptions that are wrong for ubuntu and is more limited than
> >> you may want in terms of what it even tries to handle. But that's beside
> >> the point of parallel shutdowns.
> >
> > Also wrong for Fedora.
> >
> >> * cgroup handling includes a particular stack of override logic for
> >> possible cgroup mount points that makes sense to me.
> >> - start with built-in default /var/run/lxc/cgroup, and name it "lxc" so
> >> as not to conflict with any other cgroup setup by default.
> >> - if you defined something in $LXC_CONF, prefer it over default
> >> - if kernel is providing /sys/fs/cgroup automatically, prefer that over
> >> either default or $LXC_CONF
> >> - if a cgroup named "lxc" is already mounted, prefer that over all else
> >
> > I'm not quite sure if I would put those last two in that order.
> > Especially after the last little discussion over on LKML over the per
> > tty cgroups in the kernel vs in user space, I think I would let the
> > kernel defined /sys/fs/cgroup trump all else if it exists.  Something
> > that's been mounted may not have been mounted with all the options you
> > may want, but I'm not sure how much difference that's going to make.  I
> > would think the kernel definition would be preferable.  Is there
> > something specific you had in mind that would lead you to want to
> > override that?
> >
> >> * assumes lxc 0.7.2 because the script is part of a lxc-0.7.2 rpm
> >> - removes the shutdown/reboot watchdog functions that were needed in
> >> 0.6.5 but are built in to 0.7.2 now.
> >
> >> * only starts containers that are defined by $LXC_ETC/*/config
> >
> > Yeah, that's something where I wish we had an "onboot" and/or "disabled"
> > config file like OpenVZ does.  So you can have some configured but that
> > don't autoboot when you boot the system.  As that stands, you would have
> > to rename or remove the config file.  :-P
> >
> >> * only shuts down containers that it started
> >
> > I don't quite see that as happening literally as described.  Looks like
> > it's going to shut down any container for which it can find a powerfail
> > init, even if it was started by some other means, say manually.  It
> > doesn't seem to be actually tracking what ones it started.  Granted,
> > during normal operation, you're going to try to start everything with a
> > config but it looks like it will shut down manually started containers
> > as well, even if they are not listed with configs and would not even
> > show up in your status commands.  I don't even see a check in that
> > routine for the config file.
> 
> The "already-mounted" only trumps if it's also named "lxc", not just any 
> old mounted cgroup fs. This is to handle the cases of:
> * upgrading the kernel
> * upgrading the lxc package (including while containers are running 
> without shutting them down)
> * editing/creating the optional /etc/lxc/lxc.conf which may define the 
> cgroup mount point, while containers are running.
> * stopping/starting individual containers, creating new individual 
> containers and starting them up, without shutting down others that were 
> started at boot and you never want to shut down.

Got it.  Ok

> It's optimal for several reasons and is doing exactly what I want in 
> various cases you probably didn't think of. Remember I'm actually using 
> it in production so I'm handling problems I actually ran into and 
> allowing for actions I actually need to allow for.

Oh, I have about 3 dozen VM's in production right now myself on 4
production servers, running a mix of Fedora and CentOS with an
occasional Ubuntu thrown in there for testing, experimentation, and
giggles.  You and I seem to be working along the same lines of deploying
it and smoke-testing it and fine tuning based on the debris and the
experience, yeah.

> Some of it is driven 
> by the fact that lxc is still very new and the components both in kernel 
> and in userspace are still changing a lot from month to month, and the 
> stock distribution can be quite different from the current kernel and 
> current lxc, yet I want the package to work on the stock system as well 
> as the current parts, and any likely mix between the two, with the 
> changes happening at any time, while keeping running containers running 
> if possible. And it is possible, since that little bit of shell logic is 
> handling it just fine and costs nothing.

> As for fedora, the only reason I mentioned it was wrong for ubuntu was 
> because the OP is talking about ubuntu so I wanted to make it clear that 
> the only reason I'm showing this script is because it does show how to 
> do the two things he thinks can't be done, not that it'll actually work 
> as-is on his box. It's an opensuse script in an opensuse rpm. I don't 
> even want to try to make it handle any of the redhat line. Let a redhat 
> packager do that in their rpm, which might be me if I switch to centos 
> and if they don't already have something better, but it's not me today.

And conversely, I keep intending to fire up a couple of SUSE VMs as well
for that experience and I haven't done it yet.

> As for not looking for any config file, what do you mean?
> It's looking for an optional /etc/lxc/lxc.conf

Sorry, I think I may have been digressing a bit there because the
main-line lxc scripts don't look for /etc/lxc/lxc.conf and I think they
should.  I believe I was making an oblique reference to them.

> Which is just a place to put variables that would affect the behaviour 
> of the init script or the various lxc-* tools. Or you could even put 
> fancier code in there to do anything since it's just being sourced by a 
> bash instance.
> Then for each container it's looking for /etc/lxc/*/config

Yeah, that's good.

> It's true that it's not doing any sort of validation of their contents, 
> just requiring that the file exists else the directory is skipped with 
> no action.
> But that's exactly as I want. Validating the contents of config files 
> lies somewhere between "not the job of an init script" or "sure it would 
> be nice just like the nine million other possible enhancements that 
> could be developed over time, but it's a complete luxury, not at all 
> necessary". Other services init scripts do not try to evaluate the 
> contents of their services config files.

> -- 
> bkw
> 
> >
> >> * the stop function greps for /sbin/init in container inittab instead of
> >> trying to allow for any random container pid #1
> >
> > Yeah, that little trick won't work with Ubuntu or Fedora using upstart.
> >
> > That gives you the peculiar situation here that if you have a modern
> > Fedora or Ubuntu container, your script even on an OpenSUSE system would
> > start them but could not stop them while it would not start a container
> > without a config but would stop a running SUSE one regardless of a
> > config.
> >
> >> * no provision for application/service containers, just whole systems
> >> started with /sbin/init
> >
> > That only makes sense.
> >
> >> * starts containers in screen
> >> - I have not figured out what it would take to get nice behavior out of
> >> lxc-console yet and screen is both easy and standard.
> >
> > I usually start them with logging enabled and redirected with the -o
> > option.  To each his own.
> >
> >> The $LXC_CONF (/etc/lxc/lxc.conf) referenced at the top does not exist
> >> usually so everything that happens is visible right in the script.
> >
> > That's a real good practice.
> >
> >> I'm using this in production. So far so good.
> >
> >> typical usage:
> >>
> >> nj10:~ # rclxc status
> >
> > Of what is this rclxc command of which you speaketh?  Oh, I see from the
> > script that rclxc is a symlink to /etc/init.d/lxc.  But what's the rc_*
> > things in there?  Something you sourced out of rc.status?  That's
> > something SUSE?

> It's an opensuse init script. It conforms to the standard for opensuse 
> init scripts which includes using certain system functions that all 
> well-behaved opensuse init scripts are supposed to use. This is defined 
> in /etc/init.d/skeleton on any opensuse box and probably somewhere in 
> wiki.opensuse.org

> You are right, I _did_ say all actions & behavior of the script are 
> visible within it, meaning that nothing is in the config file modifiying 
> those variables being used throughout, yet as you point out there is a 
> file being sourced which you can not see (unless you are also an 
> opensuse user). Not to worry. That file and the functions it provides 
> are not affecting the running or stopping of the containers. It does 
> provide functions for running/killing daemons, but this script only uses 
> functions that merely inform the system about the current state of the 
> "lxc service". For the case of copying the script logic, you can just 
> ignore them all.

> You're right too, that with this you can't run ubuntu on opensuse. It's 
> just assuming opensuse on opensuse. That's actually a limitation I don't 
> want so I guess this is actually my problem too even though I didn't 
> think so until you just pointed that out.

> I also want to support application/service containers. I don't think 
> there is any inherent reason it can't be done reliably, just that I have 
> not yet used containers that way myself so I can't yet write a script to 
> support it.

> Ideally libvirt should be the tool to make this all handled by the same 
> init script and other tools that handle xen and kvm so well already, but 
> I couldn't get libvirt to work at all to manage containers. I think not 
> enough people are even trying yet and so support is still pre-alpha, but 
> it's just not _labeled_ that way and the docs lead you to beleive it's 
> supposed to actually work.

Oh, now that's interesting.  I'll have to revisit that.  I was thinking
that was in better shape than you seem to indicate.

Now, understand too that libvirt is handling containers very differently
than the lxc project.  Both make reference to Linux Containers as lxc
but libvirt is independent of this project.  I was confused by that
initially at first.

> -- 
> bkw

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20101206/4eccdb01/attachment.pgp>