[Lxc-users] On clean shutdown of Ubuntu 10.04 containers

Mon Dec 6 22:38:09 UTC 2010

On 12/6/2010 3:34 PM, Michael H. Warfield wrote:
> On Mon, 2010-12-06 at 12:38 -0500, Brian K. White wrote:
>> On 12/6/2010 2:42 AM, Trent W. Buck wrote:
>>> This post describes my attempts to get "clean" shutdown of Ubuntu 10.04
>>> containers.  The goal here is that a "shutdown -h now" of the dom0
>>> should not result in a potentially inconsistent domU postgres database,
>>> cf. a naive lxc-stop.
>>>
>>> As at Ubuntu 10.04 with lxc 0.7.2, lxc-start detects that a container
>>> has halted by 1) seeing a reboot event in<container>/var/run/utmp; or
>>> 2) seeing<container>'s PID 1 terminate.
>>>
>>> Ubuntu 10.04 simply REQUIRES /var/run to be a tmpfs; this is hard-coded
>>> into mountall's (upstart's) /lib/init/fstab.  Without it, the most
>>> immediate issue is that /var/run/ifstate isn't reaped on reboot, ifup(8)
>>> thinks lo (at least) is already configured, and the boot process hangs
>>> waiting for the network.
>>>
>>> Unfortunately, lxc 0.7's utmp detect requires /var/run to NOT be a
>>> tmpfs.  The shipped lxc-ubuntu script works around this by deleting the
>>> ifstate file and not mounting a tmpfs on /var/run, but to me that is
>>> simply waiting for something else to assume /var/run is empty.  It also
>>> doesn't cope with a mountall upgrade rewriting /lib/init/fstab.
>>>
>>> More or less by accident, I discovered that I can tell lxc-start that
>>> the container is ready to halt by "crashing" upstart:
>>>
>>>       container# kill -SEGV 1
>>>
>>> Likewise I can spoof a ctrl-alt-delete event in the container with:
>>>
>>>       dom0# pkill -INT lxc-start
>>>
>>> I automate the former signalling at the end of shutdowns thusly:
>>>
>>>       chroot $template_dir dpkg-divert --quiet --rename /sbin/reboot
>>>       chroot $template_dir tee>/dev/null /sbin/reboot<<-EOF
>>>       	#!/bin/bash
>>>       	while getopts nwdfiph opt
>>>       	do [[ f = \$opt ]]&&   exec kill -SEGV 1
>>>       	done
>>>       	exec -a "$0" "\$0.distrib" "\$@"
>>>       	EOF
>>>       chroot $template_dir chmod +x /sbin/reboot
>>>       chroot $template_dir ln -s reboot.distrib /sbin/halt.distrib
>>>       chroot $template_dir ln -s reboot.distrib /sbin/poweroff.distrib
>>>
>>> I use the latter in my customized /etc/init.d/lxc stop rule.
>>> Note that the lxc-wait's SHOULD be parallelized, but this is not
>>> possible as at lxc 0.7.2 :-(
>>
>> Sure it is.
>> I parallelize the shutdowns (in any version, including 0.7.2) by doing
>> all the lxc-stop in parallel without looking or waiting, then in a
>> separate following step do a loop that waits for no containers running.
>
>> Here is my openSUSE init.d/lxc:
>> https://build.opensuse.org/package/files?package=lxc&project=home:aljex
>> And the packages:
>> http://download.opensuse.org/repositories/home:/aljex/*/lxc-0.7.2*.rpm
>
>> It makes assumptions that are wrong for ubuntu and is more limited than
>> you may want in terms of what it even tries to handle. But that's beside
>> the point of parallel shutdowns.
>
> Also wrong for Fedora.
>
>> * cgroup handling includes a particular stack of override logic for
>> possible cgroup mount points that makes sense to me.
>> - start with built-in default /var/run/lxc/cgroup, and name it "lxc" so
>> as not to conflict with any other cgroup setup by default.
>> - if you defined something in $LXC_CONF, prefer it over default
>> - if kernel is providing /sys/fs/cgroup automatically, prefer that over
>> either default or $LXC_CONF
>> - if a cgroup named "lxc" is already mounted, prefer that over all else
>
> I'm not quite sure if I would put those last two in that order.
> Especially after the last little discussion over on LKML over the per
> tty cgroups in the kernel vs in user space, I think I would let the
> kernel defined /sys/fs/cgroup trump all else if it exists.  Something
> that's been mounted may not have been mounted with all the options you
> may want, but I'm not sure how much difference that's going to make.  I
> would think the kernel definition would be preferable.  Is there
> something specific you had in mind that would lead you to want to
> override that?
>
>> * assumes lxc 0.7.2 because the script is part of a lxc-0.7.2 rpm
>> - removes the shutdown/reboot watchdog functions that were needed in
>> 0.6.5 but are built in to 0.7.2 now.
>
>> * only starts containers that are defined by $LXC_ETC/*/config
>
> Yeah, that's something where I wish we had an "onboot" and/or "disabled"
> config file like OpenVZ does.  So you can have some configured but that
> don't autoboot when you boot the system.  As that stands, you would have
> to rename or remove the config file.  :-P
>
>> * only shuts down containers that it started
>
> I don't quite see that as happening literally as described.  Looks like
> it's going to shut down any container for which it can find a powerfail
> init, even if it was started by some other means, say manually.  It
> doesn't seem to be actually tracking what ones it started.  Granted,
> during normal operation, you're going to try to start everything with a
> config but it looks like it will shut down manually started containers
> as well, even if they are not listed with configs and would not even
> show up in your status commands.  I don't even see a check in that
> routine for the config file.

The "already-mounted" only trumps if it's also named "lxc", not just any 
old mounted cgroup fs. This is to handle the cases of:
* upgrading the kernel
* upgrading the lxc package (including while containers are running 
without shutting them down)
* editing/creating the optional /etc/lxc/lxc.conf which may define the 
cgroup mount point, while containers are running.
* stopping/starting individual containers, creating new individual 
containers and starting them up, without shutting down others that were 
started at boot and you never want to shut down.

It's optimal for several reasons and is doing exactly what I want in 
various cases you probably didn't think of. Remember I'm actually using 
it in production so I'm handling problems I actually ran into and 
allowing for actions I actually need to allow for. Some of it is driven 
by the fact that lxc is still very new and the components both in kernel 
and in userspace are still changing a lot from month to month, and the 
stock distribution can be quite different from the current kernel and 
current lxc, yet I want the package to work on the stock system as well 
as the current parts, and any likely mix between the two, with the 
changes happening at any time, while keeping running containers running 
if possible. And it is possible, since that little bit of shell logic is 
handling it just fine and costs nothing.

As for fedora, the only reason I mentioned it was wrong for ubuntu was 
because the OP is talking about ubuntu so I wanted to make it clear that 
the only reason I'm showing this script is because it does show how to 
do the two things he thinks can't be done, not that it'll actually work 
as-is on his box. It's an opensuse script in an opensuse rpm. I don't 
even want to try to make it handle any of the redhat line. Let a redhat 
packager do that in their rpm, which might be me if I switch to centos 
and if they don't already have something better, but it's not me today.

As for not looking for any config file, what do you mean?
It's looking for an optional /etc/lxc/lxc.conf
Which is just a place to put variables that would affect the behaviour 
of the init script or the various lxc-* tools. Or you could even put 
fancier code in there to do anything since it's just being sourced by a 
bash instance.
Then for each container it's looking for /etc/lxc/*/config

It's true that it's not doing any sort of validation of their contents, 
just requiring that the file exists else the directory is skipped with 
no action.
But that's exactly as I want. Validating the contents of config files 
lies somewhere between "not the job of an init script" or "sure it would 
be nice just like the nine million other possible enhancements that 
could be developed over time, but it's a complete luxury, not at all 
necessary". Other services init scripts do not try to evaluate the 
contents of their services config files.

-- 
bkw

>
>> * the stop function greps for /sbin/init in container inittab instead of
>> trying to allow for any random container pid #1
>
> Yeah, that little trick won't work with Ubuntu or Fedora using upstart.
>
> That gives you the peculiar situation here that if you have a modern
> Fedora or Ubuntu container, your script even on an OpenSUSE system would
> start them but could not stop them while it would not start a container
> without a config but would stop a running SUSE one regardless of a
> config.
>
>> * no provision for application/service containers, just whole systems
>> started with /sbin/init
>
> That only makes sense.
>
>> * starts containers in screen
>> - I have not figured out what it would take to get nice behavior out of
>> lxc-console yet and screen is both easy and standard.
>
> I usually start them with logging enabled and redirected with the -o
> option.  To each his own.
>
>> The $LXC_CONF (/etc/lxc/lxc.conf) referenced at the top does not exist
>> usually so everything that happens is visible right in the script.
>
> That's a real good practice.
>
>> I'm using this in production. So far so good.
>
>> typical usage:
>>
>> nj10:~ # rclxc status
>
> Of what is this rclxc command of which you speaketh?  Oh, I see from the
> script that rclxc is a symlink to /etc/init.d/lxc.  But what's the rc_*
> things in there?  Something you sourced out of rc.status?  That's
> something SUSE?

It's an opensuse init script. It conforms to the standard for opensuse 
init scripts which includes using certain system functions that all 
well-behaved opensuse init scripts are supposed to use. This is defined 
in /etc/init.d/skeleton on any opensuse box and probably somewhere in 
wiki.opensuse.org

You are right, I _did_ say all actions & behavior of the script are 
visible within it, meaning that nothing is in the config file modifiying 
those variables being used throughout, yet as you point out there is a 
file being sourced which you can not see (unless you are also an 
opensuse user). Not to worry. That file and the functions it provides 
are not affecting the running or stopping of the containers. It does 
provide functions for running/killing daemons, but this script only uses 
functions that merely inform the system about the current state of the 
"lxc service". For the case of copying the script logic, you can just 
ignore them all.

You're right too, that with this you can't run ubuntu on opensuse. It's 
just assuming opensuse on opensuse. That's actually a limitation I don't 
want so I guess this is actually my problem too even though I didn't 
think so until you just pointed that out.

I also want to support application/service containers. I don't think 
there is any inherent reason it can't be done reliably, just that I have 
not yet used containers that way myself so I can't yet write a script to 
support it.

Ideally libvirt should be the tool to make this all handled by the same 
init script and other tools that handle xen and kvm so well already, but 
I couldn't get libvirt to work at all to manage containers. I think not 
enough people are even trying yet and so support is still pre-alpha, but 
it's just not _labeled_ that way and the docs lead you to beleive it's 
supposed to actually work.

-- 
bkw