[Lxc-users] Working LXC templates?

Fri Sep 13 14:53:48 UTC 2013

Hey Tony!

I've kept meaning to revisit this and answer your question...

On Wed, 2013-09-04 at 10:23 -0700, Tony Su wrote: 
> Thx for the thoughts, Michael...
> 

> Would like to know specifically what you believe is incompatible
> running 0.8.0 with systemd.

Ok...  Here are the things we know about systemd...

1) There were radical changes in the behavior of systemd at various
times wrt (with regards to) the use of cgroups, devtmpfs, udev, and file
system mount options.  The behaviors amounted to "magic cookies".  If
you knew the magic cookie you might be able to get it to work, or you
might not.  Different things changed with different versions creating a
morass of incompatibility and unpredictability from systemd depending on
a matrix of versions.  We thought we had things fixed in 0.8.0 and then
they went and busted some more stuff we had to fix in 0.9.0 and the beat
goes on...

2) WRT cgroups, I think that was all resolved in the 0.8.0 release
framework by supporting and adapting to a variety of hierarchical mount
points for cgroups.  That case is closed.

3) WRT devtmpfs, this was the NASTY one.  At one point (on particular
systemd rev), the changed systemd to mount devtmpfs on /dev effectively
bypassing our mknod control over the file system and breaking the use of
pty pipes mounted to ttys and consoles.  This caused all sorts of random
acts of terrorism like crashing X in the host system, hijacking the host
console device, and rebooting the host by issuing init 6 in the
container.  This was misery incarnate.  After some (cough) heated
discussions with some systemd proponents, we learned that we needed to
mount a ramfs device on /dev (magic cookie) in the container to prevent
systemd from mounting devtmpfs there and that we then needed to
autopopulate that /dev/ directory.  There was also something to do with
environment variables in there as well (another magic cookie).  This all
gave birth to the "lxc.autodev" option in the configuration files (and
some code changes by Serge and myself to the lxc-start C files) to tell
lxc to perform those operations (mount ramfs on /dev and populate).
That might be in 0.8.0, I'm not sure.  I think it is.  I'm pretty sure
it is.

4) WRT the device mount options.  At a different, later, systemd rev,
they change one of the mount point options for the host file systems (I
think it was some shared mount propagation (MS_SHARED, MS_SLAVE???)
causing a failure in the pivot_root when attempting to start a container
when you have systemd running in the host (yet another magic cookie to
get the mount operations right).  That one, I'm pretty much certain is
NOT in 0.8.0 and you have to be at least running 0.9.0.

5) Various setup issues (such as lxc.autodev) for systemd based
containers were not being automatically performed in the lxc-fedora
template scripts and required manual intervention after initial
container creation.  That's where my coding has come in and much of that
is NOT even in 0.9.0 but you have to go to the stage branch of git for
those.

6) Systemd has had a history of changes in functionality and side
effects for a long time.  I was never able to get F15 or F16 to reliably
run as containers due to incompatibility with the versions of systemd
included.  Both involved changes in the way systemd behaved wrt udev.  I
was almost able to get F15 to work but it was a butt ugly hack that had
warts all over it.  It did run, for some value of "run" but I wouldn't
use it or trust it.  F16 was a totally dead-on-arrival non-starter due
to a horribly incompatible udev handler in systemd that was in F16.
Never did figure out how they managed to break things THAT BADLY in that
version of systemd.  Neither of those are supported any longer so, quite
frankly, I couldn't give a damn any more.

So, all of this has led Serge to list me on the roster for the
LinuxPlumbers conference as the LXC "systemd expert".  I'll get even
with him later next week for that one...

> I have not noticed any issues running openSUSE 12.2 (systemd v44?) and
> openSUSE 12.3 (systemd v195) on an openSUSE Host.

> I've been able to run systemd commands (typically systemctl and
> journalctl) from within the Containers without an issue most of the
> time without issue and in the rare cases something didn't work I've
> typically guessed it was just a potential security issue.

> I'm guessing that running systemd both fundamentally and using its
> various associated commands should not be an issue by simply applying
> namespaces (personally have been speculating on the use of simple
> random strings instead of user-readable text strings for security
> reasons).

Devices namespaces should solve many of the problems systemd wantonly
introduced with devtmpfs.  But that will then introduce yet another
change we'll have to adapt too.  ITMT, namespaces in general should not
be a problem.
> 
> Based on the various templates I've tried, this issue is likely
> related to how complete and self-contained the container template is,
> the less it bind mounts or otherwise re-uses the Host system
> beyond /proc I consider the template more agnostic. Binding Host
> resources is what causes agnostic problems despite its various
> benefits.

Not really true.  Bind mounting actually provides more insulation and
control.  There are far more issues than just bind mounting.  Bind
mounts must be managed properly and are the ideal solution for the
appropriate shared resources but should not be used for things like
shared memory, dev, run, etc, etc...
> 
> The reason why a specific repo URL wouldn't be useful is because
> interestingly Fedora's installation only looks up a Host which based
> on the install's world-wide location and type of proposed installation
> returns a recommended URL pointing to some repo mirror. This means
> that there is no point testing against any of the mirror URLs, they
> should all operate the same way. The URL I provided should be the only
> relevant and critical URL the install template needs to use correctly.

Actually, my proposed template I posted earlier specifically uses the
mirrors.kernel.org mirror specifically because not all of the mirrors
are identical.  Some don't carry certain non-package files (LiveOS
images or isos) and not all support rsync.  So, that one particular
mirror is a good reference point for certain, very specific, operations
(most especially rsync access to the LiveOS file systems).
> 
> As for why the authentication (GPG?) check isn't working, I haven't
> dived into exactly what package is being used... But IMO it would make
> sense in this pre-install environment, the Host's function is being
> used and openSUSE does require User verification (unless the "silent"
> switch is invoked which now that I'm thinking about it could be the
> answer to my specific situation but may not address other distros).
> Hmmmm... so, maybe I have something to experiment with...
> 
> 
> Thx,
> 
> Tony
> 
Regards, Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20130913/13282f52/attachment.pgp>