[lxc-devel] [systemd-devel] Unable to run systemd in an LXC / cgroup container.

Tue Nov 6 16:07:10 UTC 2012

On Mon, 2012-10-22 at 16:11 +0200, Lennart Poettering wrote:

> Note that there are reports that LXC has issues with the fact that newer
> systemd enables shared mount propagation for all mounts by default (this
> should actually be beneficial for containers as this ensures that new
> mounts appear in the containers). LXC when run on such a system fails as
> soon as it tries to use pivot_root(), as that is incompatible with
> shared mount propagation. The needs fixing in LXC: it should use MS_MOVE
> or MS_BIND to place the new root dir in / instead. A short term
> work-around is to simply remount the root tree to private before
> invoking LXC.

In another thread, Serge had some heartburn over this shared mount
propagation which then rang a bell in my head about past problems we
have seen.

> On Mon, 2012-11-05 at 08:51 -0600, Serge Hallyn wrote: 
> > Quoting Michael H. Warfield (mhw at WittsEnd.com):
> > ...
> > This was from another threat with the systemd guys.
> > 
> > On Mon, 2012-10-22 at 16:11 +0200, Lennart Poettering wrote:
> > > Note that there are reports that LXC has issues with the fact that
> > > newer
> > > systemd enables shared mount propagation for all mounts by default
> > > (this
> > > should actually be beneficial for containers as this ensures that new
> > > mounts appear in the containers). LXC when run on such a system fails
> 
> MS_SLAVE does this as well.  MS_SHARED means container mounts also
> propagate into the host, which is less desirable in most cases.

Here's where we've seen some problems in the past.  It's not just mounts
that are propagated but remounts as well.  The problem arose that some
of us had our containers on a separate partition.  When we would shut a
container down, that container tried to remount its file systems ro
which then propagated back into the host causing the hosts file system
to be ro (doesn't happen if you are running on the host's root fs for
the containers) and from there across into the other containers.

Are you using MS_SHARED or MS_SLAVE for this?  If you are using
MS_SHARED do you create a potential security problem where actions in
the container can bleed into the state of the host and into other
containers.  That's highly undesirable.  If a mount in a propagates back
into the host and is then reflected to another container sharing that
same mount tree (I have shared partitions specific to that sort of
thing) does that create an information disclosure situation of one
container mounts a new file system and the other container sees the new
mount?  I don't know if the mount propagation would reflect back up the
shared tree or not but I have certainly seen remounts do this.  I don't
see that as desirable.  Maybe I'm misunderstand how this is suppose to
work but I intend to test out those scenarios when I have a chance.  I
do know that, when testing that ro problem, I was able to remount a
partition ro in one container and it would switch in the host and the
other container and I could the remount it rw in the other container and
have it propagate back.  Not good.

Can you offer any clarity on this?

> > > as
> > > soon as it tries to use pivot_root(), as that is incompatible with
> > > shared mount propagation. The needs fixing in LXC: it should use
> > > MS_MOVE
> > > or MS_BIND to place the new root dir in / instead. A short term

> Actually not quite sure how this would work.  It should be possible
> to set up a set of conditions to work around this, but the kernel
> checks at do_pivotroot are pretty harsh - mnt->mnt_parent of both
> the new root and current root have to be not shared.  So perhaps
> we actually first chroot into a dir whose parent is non-shared,
> then pivot_root from there?  :)
> 
> (Simple chroot in place of pivot_root still does not suffice, not
> only because of chroot escapes, but also different results in
> /proc/pid/mountinfo and friends)

Comments on Serge's points?

At this point, we see where this will become problematical in Fedora 18
but appears to already be problematical in NixOS that another user is
running and which containers systemd 195 in the host.

We've had problems with chroot in the past due to chroot escapes and
other problems years ago as Serge mentioned.

> Lennart

> -- 
> Lennart Poettering - Red Hat, Inc.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20121106/ad61fe80/attachment.pgp>