[lxc-users] How to cancel lxc-autostart

Sat Aug 9 16:00:08 UTC 2014

Third solution proposal near bottom.  Sorry for longish post, but...

On Sat, 2014-08-09 at 10:32 -0400, CDR wrote:
> This is a philosophical divide. I live in the real world, and are
> successfully moving  all my business to LXC, or a combination of LXC
> and real virtualization, where you have a few virtual machines with
> hundreds of GBs of RAM and 36 or more cores, and these super-virtual
> machines act solely as container-of-containers. It means that my
> virtual machines have so many autostart containers, that it takes 30
> minutes to stop them all in a loop. When for some reason I need to
> start the machines and do not need all the containers starting, the
> only way is to boot in single-user mode. Why? There should be way to
> stop the storm in its tracks, like
> cat 0 > /proc/lxc/autostart
> this way I could quickly stop the few containers that had already started.
> I see a world coming where every living corporation will be using a
> combination of Virtualization plus LXC.

OK, if we're going to get into it, I also live in the real world where I
put these things to practical uses and I look for practical answers to
practical questions.  I have several colocated servers across multiple
and several isolated servers with a number of containers on each.  I
control and manage a large span (/16) of IPv4 address space and I peer
live on BGP using my own ASN (25944) for both IPv4 and IPv6 advertising
my own real global address space.  I'm even the author of some of the
BGP code in the Quagga routing suite (MD5 TCP signature code).  Yes, I
think I qualify as a practical, real world, engineer who has been in
practice and solving problems for well over 30 years at this point.

You posited an unlikely condition.  The case where you had to stop
lxc-autostart in order to fsck a file system for a container.  But, how
did the file system get mounted dirty?  That should not happen.  Sounds
like that indicates other underlying problems.  If the file system did
not get mounted, that phase should have aborted anyways unless it was
somehow miss-engineered in the first place.  If it's a container
specific failure, you can fix that container problem without shutting
down all the other containers and if it's a generic problem, the other
affected containers should not have started either.

We (I and one other) gave you practical solutions to the problem you
posited but apparently wasn't the answer you have envisioned.  We also
foresaw problems with your "solution" which you obviously had not.  You
haven't countered either of the proposed solutions with reasons why they
would not work or offered any more "real world" examples on which we
could base and analysis.

From the description of what you've provided above, the "where you have
a few virtual machines with hundreds of GBs of RAM and 36 or more cores,
and these super-virtual machines act solely as container-of-containers",
it sounds like you're just designing this thing and much of this is
theoretical.

It also sounds like, in the case you're describing now, you need to
apply some higher level management and some serious addressing of
container groups.  This is exactly one of the cases I envision when I
was working on lxc-autostart and the whole group model, multiple group
membership and boot order/priority.  To manage things on that level,
you're going to need a higher level of management and not just a
shot-gun to fire back at the stream of bullets.  In that case, you're
going to need a well documented, stepped, incremental, and grouped
approach to that multitude of containers-of-containers.  Yes, you're
dealing with exactly the load average and real-time delays inherent in
that level of complexity.  I doubt any solution is capable of fully
addressing that.

In the container-of-containers case, you're also going to need some
level (on bootup) of cross container coordination (allow one container
to boot it's containers after another container has completed it's
boot?).  I especially don't know how you would expect an lxc-autostart
abort to behave in that case.  You're not going to get it to shut down
all the containers running lxc-autostart.  This is a morass and you're
going to need a firmly ordered approach, as opposed to a shotgun
approach.

As a practical, real world, engineer, I don't always get the answer I
would like (for one reason or another) but I do generally get my
problems solved (sometimes in ways I hadn't thought of that were
significantly better - lxc-autostart is head and shoulders better than
the lxc-run script I was using internally and never proposed).

I usually get further by asking "how do I solve this scenario" rather
than say "We like toast, make it make toast" (an actual comment I heard
at an IETF Conference where I was talking with one of the WG chairs
complaining about a certain faction within his WG and their personal
proposal that he wanted my opinion on - names of the guilty shall not be
revealed).

Bottom line is that you have yet to convince me or others that this is
even a problem case where better solutions do not exist.

Tell you what...  I'll give you a third solution.  Take the "upstart"
start-up as a paradigm and create a script.  Use lxc-autostart to list
out the containers to run but then run them manually using lxc-start and
check at each step for a "continuation" file.  Something like this:

#!/bin/sh -
#
# My autostart (extracted from config/init/upstart/lxc.start
#
# pre-start script
        [ -f /etc/default/lxc ] && . /etc/default/lxc

        # don't load profiles if mount mediation is not supported
        SYSF=/sys/kernel/security/apparmor/features/mount/mask
        if [ -f $SYSF ]; then
                if [ -x /lib/init/apparmor-profile-load ]; then
                        /lib/init/apparmor-profile-load usr.bin.lxc-start
                        /lib/init/apparmor-profile-load lxc-containers
                fi
        fi

        [ "x$LXC_AUTO" = "xtrue" ] || exit 0

        if [ -n "$BOOTGROUPS" ]
        then
                BOOTGROUPS="-g $BOOTGROUPS"
        fi

        # My /\/\|=mhw=|\/\/ insert here:
        echo $$ > /var/run/lxc/autostart

        # Process the "onboot" group first then the NULL group.
        lxc-autostart -L $OPTIONS $BOOTGROUPS | while read line; do
                set -- $line
                (start lxc-instance NAME=$1 && sleep $2) || true
                # My /\/\|=mhw=|\/\/ insert here:
                [ -e /var/run/lxc/autostart ] || break
        done
        # My /\/\|=mhw=|\/\/ insert here:
        rm -f /var/run/lxc/autostart
# end script

Now, add the various variable initializations and create your own script
for your controlled containers.  Embellish it with whatever other
controls, checks, initialization and prioritations you desire.

I took the "pre-start" script from config/init/upstart/lxc.conf and
added 3 lines of code to it (marked above) and that will do exactly what
you want.  All you have to do is remove /var/run/lxc/autostart while
that script is running and the script will terminate after cleanly
starting the container in progress without attempting to start another.
No need to modify anything or create some new command for your special
case, yet it addresses your case cleanly and easily.

That actually might argue for using something similar in the sysvinit
case and/or the systemd case, neither of which use the autostart -L loop
model but should work.  That's worth discussion and I'll bring it up
with Dwight since he and I are largely involved in the sysvinit and
systemd init cruft.  Still, this is a corner case.

Now you have 3 possible solutions for the problems you are describing.
Tell us why they will not address what you are trying to solve.

> Philip

> On Sat, Aug 9, 2014 at 8:27 AM, brian mullan <bmullan.mail at gmail.com> wrote:
> > I've been reading this thread and this is the first and only time I've ever
> > heard anyone request such a "kill all" command for LXC to terminate
> > auto-start.
> >
> > Developer time is always in short supply and IMHO asking one of them to
> > spend their time on such a "corner-case" issue is not putting their efforts
> > to good use.
> >
> > There have been 2 alternatives proposed that seem would handle this event
> > and my opinion is that should be sufficient.
> >
> > LXC 1.x has a lot of important work going on and I'd rather see people
> > focused on the existing roadmap or on addressing critical bugs.
> >
> > Of course its all Open Source so anyone that can't live without such a
> > feature could either contribute the patches themselves or offer a bounty to
> > have it done for them.
> >
> > again just my opinion
> >
> > Brian
> >
> >
> > _______________________________________________
> > lxc-users mailing list
> > lxc-users at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-users
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users
> 

-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 465 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140809/b02b1495/attachment-0001.sig>