[Lxc-users] Ephemeral containers flaky?

Dan Kegel dank at kegel.com
Tue Jan 22 05:20:49 UTC 2013


I have an upstart script that starts an ephemeral container running
a oneshot buildbot slave.  The slave runs one build, then exits,
which makes the ephemeral container terminate; upstart then
restarts the ephemeral container.

Fairly often, when the container is coming up after being restarted,
it stays up for only a second or two, then goes down again,
and then finally stays up for good.

The buildbot log shows something like this:

2013-01-22 05:12:55+0000 [-] twistd 12.3.0
(/home/dank/slave-state/ubu1004/bin/python 2.6.5) starting up.
...
2013-01-22 05:13:31+0000 [Broker,client] Connected to i7:9010; slave is ready
2013-01-22 05:13:31+0000 [Broker,client] sending application-level
keepalives every 600 seconds
2013-01-22 05:13:32+0000 [Broker,client]
SlaveBuilder.remote_print(pyflakes-ubu1004-master): message from
master: ping
2013-01-22 05:13:32+0000 [-] Received SIGTERM, shutting down.

which indicates that the guest's init has sent the buildbot slave
SIGTERM for some reason only one second after it finally manages to
connect.  (Sometimes it happens before the slave manages to connect.)

If I run lxc-ssh to connect to the container (it's a little
script that sits in a loop like lxc-start-ephemeral does),
I see something similar; the login succeeds but then I get logged out.

Has anybody seen this kind of behavior?  It's as if a quarter of the time, the
ephemeral container starts up in a bad state, and exits in a few seconds.
And since the container is flaky, I don't get a lasting record of what
went wrong.

(I tried cranking up the debug level on init with initctl log-priority debug,
but that generated gibberish lines in the host's syslog.)




More information about the lxc-users mailing list