[Lxc-users] On clean shutdown of Ubuntu 10.04 containers
Brian K. White
brian at aljex.com
Mon Dec 6 17:38:16 UTC 2010
On 12/6/2010 2:42 AM, Trent W. Buck wrote:
> This post describes my attempts to get "clean" shutdown of Ubuntu 10.04
> containers. The goal here is that a "shutdown -h now" of the dom0
> should not result in a potentially inconsistent domU postgres database,
> cf. a naive lxc-stop.
>
> As at Ubuntu 10.04 with lxc 0.7.2, lxc-start detects that a container
> has halted by 1) seeing a reboot event in<container>/var/run/utmp; or
> 2) seeing<container>'s PID 1 terminate.
>
> Ubuntu 10.04 simply REQUIRES /var/run to be a tmpfs; this is hard-coded
> into mountall's (upstart's) /lib/init/fstab. Without it, the most
> immediate issue is that /var/run/ifstate isn't reaped on reboot, ifup(8)
> thinks lo (at least) is already configured, and the boot process hangs
> waiting for the network.
>
> Unfortunately, lxc 0.7's utmp detect requires /var/run to NOT be a
> tmpfs. The shipped lxc-ubuntu script works around this by deleting the
> ifstate file and not mounting a tmpfs on /var/run, but to me that is
> simply waiting for something else to assume /var/run is empty. It also
> doesn't cope with a mountall upgrade rewriting /lib/init/fstab.
>
> More or less by accident, I discovered that I can tell lxc-start that
> the container is ready to halt by "crashing" upstart:
>
> container# kill -SEGV 1
>
> Likewise I can spoof a ctrl-alt-delete event in the container with:
>
> dom0# pkill -INT lxc-start
>
> I automate the former signalling at the end of shutdowns thusly:
>
> chroot $template_dir dpkg-divert --quiet --rename /sbin/reboot
> chroot $template_dir tee>/dev/null /sbin/reboot<<-EOF
> #!/bin/bash
> while getopts nwdfiph opt
> do [[ f = \$opt ]]&& exec kill -SEGV 1
> done
> exec -a "$0" "\$0.distrib" "\$@"
> EOF
> chroot $template_dir chmod +x /sbin/reboot
> chroot $template_dir ln -s reboot.distrib /sbin/halt.distrib
> chroot $template_dir ln -s reboot.distrib /sbin/poweroff.distrib
>
> I use the latter in my customized /etc/init.d/lxc stop rule.
> Note that the lxc-wait's SHOULD be parallelized, but this is not
> possible as at lxc 0.7.2 :-(
Sure it is.
I parallelize the shutdowns (in any version, including 0.7.2) by doing
all the lxc-stop in parallel without looking or waiting, then in a
separate following step do a loop that waits for no containers running.
Here is my openSUSE init.d/lxc:
https://build.opensuse.org/package/files?package=lxc&project=home:aljex
And the packages:
http://download.opensuse.org/repositories/home:/aljex/*/lxc-0.7.2*.rpm
It makes assumptions that are wrong for ubuntu and is more limited than
you may want in terms of what it even tries to handle. But that's beside
the point of parallel shutdowns.
* cgroup handling includes a particular stack of override logic for
possible cgroup mount points that makes sense to me.
- start with built-in default /var/run/lxc/cgroup, and name it "lxc" so
as not to conflict with any other cgroup setup by default.
- if you defined something in $LXC_CONF, prefer it over default
- if kernel is providing /sys/fs/cgroup automatically, prefer that over
either default or $LXC_CONF
- if a cgroup named "lxc" is already mounted, prefer that over all else
* assumes lxc 0.7.2 because the script is part of a lxc-0.7.2 rpm
- removes the shutdown/reboot watchdog functions that were needed in
0.6.5 but are built in to 0.7.2 now.
* only starts containers that are defined by $LXC_ETC/*/config
* only shuts down containers that it started
* the stop function greps for /sbin/init in container inittab instead of
trying to allow for any random container pid #1
* no provision for application/service containers, just whole systems
started with /sbin/init
* starts containers in screen
- I have not figured out what it would take to get nice behavior out of
lxc-console yet and screen is both easy and standard.
The $LXC_CONF (/etc/lxc/lxc.conf) referenced at the top does not exist
usually so everything that happens is visible right in the script.
I'm using this in production. So far so good.
typical usage:
nj10:~ # rclxc status
Checking for LXC containers...
running
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is RUNNING
'vps002' is RUNNING
'vps003' is RUNNING
'vps004' is RUNNING
'vps005' is RUNNING
'vps006' is RUNNING
'vps007' is RUNNING
'vps008' is RUNNING
'vps009' is RUNNING
'vps011' is RUNNING
'vps012' is RUNNING
'vps013' is RUNNING
nj10:~ # rclxc stop vps008
Shutting down LXC containers...
done
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is RUNNING
'vps002' is RUNNING
'vps003' is RUNNING
'vps004' is RUNNING
'vps005' is RUNNING
'vps006' is RUNNING
'vps007' is RUNNING
'vps008' is STOPPED
'vps009' is RUNNING
'vps011' is RUNNING
'vps012' is RUNNING
'vps013' is RUNNING
nj10:~ # rclxc status
Checking for LXC containers...
running
nj10:~ # rclxc stop
Shutting down LXC containers...
done
nj10:~ # rclxc status
Checking for LXC containers...
unused
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is STOPPED
'vps002' is STOPPED
'vps003' is STOPPED
'vps004' is STOPPED
'vps005' is STOPPED
'vps006' is STOPPED
'vps007' is STOPPED
'vps008' is STOPPED
'vps009' is STOPPED
'vps011' is STOPPED
'vps012' is STOPPED
'vps013' is STOPPED
nj10:~ # time rclxc start
Starting LXC containers...
done
real 0m0.242s
user 0m0.012s
sys 0m0.000s
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is RUNNING
'vps002' is RUNNING
'vps003' is RUNNING
'vps004' is RUNNING
'vps005' is RUNNING
'vps006' is RUNNING
'vps007' is RUNNING
'vps008' is RUNNING
'vps009' is RUNNING
'vps011' is RUNNING
'vps012' is RUNNING
'vps013' is RUNNING
nj10:~ # screen -r vps013
INIT: version 2.88 booting
INIT: Entering runlevel: 3
blogd: can not set console device to /dev/pts/34: Device or resource busy
Master Resource Control: previous runlevel: N, switching to runlevel:3
Initializing random number generator done
Starting syslog services done
Starting D-Bus daemon done
No keyboard map to load
Loading compose table winkeys shiftctrl latin1.add done
Stop Unicode mode done
Setting up (localfs) network interfaces:
lo
lo IP address: 127.0.0.1/8
IP address: 127.0.0.2/8
lo done
eth0
eth0 IP address: 71.187.206.90/24
eth0 done
Setting up service (localfs) network . . . . . . . . . . done
Starting SSH daemon done
Loading CPUFreq modules (CPUFreq not supported)
Starting HAL daemon done
Setting up (remotefs) network interfaces:
Setting up service (remotefs) network . . . . . . . . . . done
Re-Starting syslog services done
Starting auditd The audit system is disabled
done
Starting incron done
Starting mail service (Postfix) done
Starting CRON daemon done
Starting rpcbind done
Starting rsync daemon done
Starting smartd unused
Starting vsftpd done
Starting INET services. (xinetd) done
Master Resource Control: runlevel 3 has been reached
Skipped services in runlevel 3: splash smartd
Welcome to openSUSE 11.3 "Teal" - Kernel 2.6.37-rc3-3-default (console).
nj10-013 login:
[detached]
nj10:~ # time rclxc stop
Shutting down LXC containers...
done
real 0m8.537s
user 0m0.048s
sys 0m0.124s
nj10:~ # rclxc list
Listing LXC containers...
'vps001' is STOPPED
'vps002' is STOPPED
'vps003' is STOPPED
'vps004' is STOPPED
'vps005' is STOPPED
'vps006' is STOPPED
'vps007' is STOPPED
'vps008' is STOPPED
'vps009' is STOPPED
'vps011' is STOPPED
'vps012' is STOPPED
'vps013' is STOPPED
nj10:~ # screen -ls
No Sockets found in /var/run/screens/S-root.
nj10:~ # lxc-ps --lxc auxwww
CONTAINER USER PID %CPU %MEM VSZ RSS TTY STAT START
TIME COMMAND
nj10:~ #
--
bkw
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lxc.init
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20101206/ecc75508/attachment.ksh>
More information about the lxc-users
mailing list