[lxc-devel] [RFC] rootfs pinning

Michael H. Warfield mhw at WittsEnd.com
Tue Sep 24 21:52:57 UTC 2013


On Tue, 2013-09-24 at 21:51 +0100, Christian Seiler wrote: 
> Hi there,
> 
> >> Yep, we discussed this at Plumbers and I think it's really the way 
> >> to
> >> go, basically remove all of that fs pinning code and just do a
> >> bind-mount of the rootfs on itself in the container's mountns before
> >> starting it.
> >
> >> That way if the container decideds to remount / ro at any point, 
> >> it'll
> >> succeed and will give the user a read-only / but without affecting 
> >> the
> >> outside world.
> >
> > Ideally, I think that's the way to go and I use to do that manually 
> > when
> > setting up my containers but I was thinking there was some breakage
> > between that and the way we were working around the pivot_root 
> > problem
> > introduced by systemd (Fedora, Suse, Arch, et al).  If we can verify
> > that works with all the init flavors without breaking, that could be
> > part of the general cleanup of the mount tables in the containers as
> > well, maybe...

> Just a short comment about what I found out when looking at the
> auto-mount stuff I just sent to the list when it comes to
> bind-mounts and remounting ro:

> Take the following example:

> mount --bind /foo /bar
> mount -o remount,ro /bar

> In kernels up to at least 3.2 (but not much later) this would make the
> mount /bar read-only, but keep /foo read-write.

> But: in kernel from at most 3.8 (possibly earlier), this would actually
> remount the entire filesystem read-only or give a busy message. There
> was apparently some kind of change here.

No.  There's a change there, all right, and thank you for reminding me
of that, but (afaik) it's NOT in the kernel itself.  It's a mount
option.  It's that bloody MS_SHARED option and, to a lessor extent,
MS_SLAVE option that are behind how those things are propagated.
MS_SHARED will propagate certain things from a child mount to the mount
point and to other children, IIRC, while MS_SLAVE propagates in one
direction and MS_PRIVATE restricts it.  I think the trouble maker is
MS_SHARED and that's what caused all the "pivot_root" calls to face
plant when systemd started mounting everything with MS_SHARED in the
host system.  I was using bind mounts to avoid some of these problems
but then they changed systemd and its default mount options and broke a
number of things I had running.

> In order to properly remount bind-mounts read-only in newer kernels,
> you have to do the following:

> mount -o remount,bind,ro /bar

Check your mount point options and read the man page for mount and
"shared subtrees options".  Some of the distros have been changing the
defaults.  I don't believe it's a kernel default issue but I could be
wrong.
> This will also work in older kernels (I could only test 2.6.32, not
> earlier), so in that sense it's portable.
> 
> BUT: the typical bind-mount trick one could use to keep the container
> from remounting / ro at shutdown will apparently, as far as I can
> tell, not work anymore in 3.8, possibly earlier, since typical
> shutdown will do the equivalent of remount,ro and not add the bind
> option there.

> So unfortunately, I think we'll have to stick with pinning... :(

Actually, there, I think I agree with you, unfortunately.  I think we're
stuck with it due to ill behavior in some distros and their defaults, in
particular with regards to systemd based distros.  We need to do things
in a way that do not break on a distro running the host and in a way
that doesn't allow an arbitrary distro running in a container to
propagate random acts of terrorism to the host or other containers.  But
that's probably a good paradigm for us, anyways.

> -- Christian

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130924/db05d704/attachment.pgp>


More information about the lxc-devel mailing list