[lxc-devel] Container autostart proposal V2

Thu May 30 14:45:03 UTC 2013

On Thu, 2013-05-30 at 09:33 -0400, Stéphane Graber wrote: 
> On 05/30/2013 02:35 AM, Harald Dunkel wrote:
> > Hi folks,
> > 
> > 
> > If its not too late for a suggestion:
> > 
> > lxc-stop could provide an option -A to shutdown/stop all
> > containers, independent from their autostart flag. Worst
> > case scenario (almost) is that a disk has an IO error and
> > the config files cannot be read anymore.

> That's already covered by my proposal and I believe covered in the use
> cases listed within it.

> "lxc-stop -g any"

> That'll stop all containers that are in the "any" group. "any" is
> documented as being a special group that contains all containers even if
> they have lxc.group set.

> As you're not passing the "-a" flag, then the stop command will affect
> all containers even if lxc.start.auto is 0.

> > For my own part, if I want to stop the LXC server, then I
> > don't care whether the containers were started manually or
> > automatically. They have to stop asap to allow to unmount
> > the partitions.

> Indeed and that's specifically why I added the "any" group to my proposal.

> > And one question:
> > 
> > Do lxc-start -a ... and lxc-stop -a ... start/stop all LXC
> > containers in parallel, if their order and group are the
> > same? I am concerned about accumulating timeouts or delays
> > at shutdown or startup time of the host.

> The usual STOPPED => RUNNING or RUNNING => STOPPED time is < 1s, so no,
> we'll be doing that in serial order, but you won't really notice it
> because of how quick it's.

Which is sadly funny and the reason why I like the idea of a delay.
With systemd in particular (and upstart to a lessor extent) the massive
parallelism involved in the individual container system startup is
noticeable.  Starting a couple dozen containers at one, like you say,
only takes a second between each "start" but then the parallelized
processes pile on and I've seen my load average spike at over 30+ at
times during a power up of the lxc system.

> That's assuming lxc.start.delay isn't set. If it's set, then obviously
> startup will take longer because we'll wait of lxc.start.delay before
> starting the next container (parallelizing would make the whole
> priority/delay idea completely pointless obviously).

Not total pointless but it certainly complicates things, especially if
there are dependencies in there (an area we've chosen not to go into due
to its complexity).  I'm happy to just be able to get some of my
critical infrastructure resources (like nameservers and routing deamons)
up and stable before they get hit with the load average tsunami from
starting up the web server and database server containers.

I don't know how we would do it (and I'm not making a proposal here) but
that also raises an idea about a group.delay to be applied after all the
containers of a group have started.  So, I would start everything in the
"01_router" group, apply a delay, start everything in the "02_dns"
group, and apply a delay, then start "03_database" followed by
"04_webservers".

Only problem is that the "group" attribute is merely metadata on the
containers and has no unique persistent independent attribute stored
anywhere itself.  As attractive as that idea might be, I don't think I
would want to complicate the greater configuration structure further.

Just my 0.02 euro.

> > Many thanx
> > Harri

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!