[lxc-devel] [RFC] rootfs pinning
Michael H. Warfield
mhw at WittsEnd.com
Tue Sep 24 21:52:57 UTC 2013
On Tue, 2013-09-24 at 21:51 +0100, Christian Seiler wrote:
> Hi there,
>
> >> Yep, we discussed this at Plumbers and I think it's really the way
> >> to
> >> go, basically remove all of that fs pinning code and just do a
> >> bind-mount of the rootfs on itself in the container's mountns before
> >> starting it.
> >
> >> That way if the container decideds to remount / ro at any point,
> >> it'll
> >> succeed and will give the user a read-only / but without affecting
> >> the
> >> outside world.
> >
> > Ideally, I think that's the way to go and I use to do that manually
> > when
> > setting up my containers but I was thinking there was some breakage
> > between that and the way we were working around the pivot_root
> > problem
> > introduced by systemd (Fedora, Suse, Arch, et al). If we can verify
> > that works with all the init flavors without breaking, that could be
> > part of the general cleanup of the mount tables in the containers as
> > well, maybe...
> Just a short comment about what I found out when looking at the
> auto-mount stuff I just sent to the list when it comes to
> bind-mounts and remounting ro:
> Take the following example:
> mount --bind /foo /bar
> mount -o remount,ro /bar
> In kernels up to at least 3.2 (but not much later) this would make the
> mount /bar read-only, but keep /foo read-write.
> But: in kernel from at most 3.8 (possibly earlier), this would actually
> remount the entire filesystem read-only or give a busy message. There
> was apparently some kind of change here.
No. There's a change there, all right, and thank you for reminding me
of that, but (afaik) it's NOT in the kernel itself. It's a mount
option. It's that bloody MS_SHARED option and, to a lessor extent,
MS_SLAVE option that are behind how those things are propagated.
MS_SHARED will propagate certain things from a child mount to the mount
point and to other children, IIRC, while MS_SLAVE propagates in one
direction and MS_PRIVATE restricts it. I think the trouble maker is
MS_SHARED and that's what caused all the "pivot_root" calls to face
plant when systemd started mounting everything with MS_SHARED in the
host system. I was using bind mounts to avoid some of these problems
but then they changed systemd and its default mount options and broke a
number of things I had running.
> In order to properly remount bind-mounts read-only in newer kernels,
> you have to do the following:
> mount -o remount,bind,ro /bar
Check your mount point options and read the man page for mount and
"shared subtrees options". Some of the distros have been changing the
defaults. I don't believe it's a kernel default issue but I could be
wrong.
> This will also work in older kernels (I could only test 2.6.32, not
> earlier), so in that sense it's portable.
>
> BUT: the typical bind-mount trick one could use to keep the container
> from remounting / ro at shutdown will apparently, as far as I can
> tell, not work anymore in 3.8, possibly earlier, since typical
> shutdown will do the equivalent of remount,ro and not add the bind
> option there.
> So unfortunately, I think we'll have to stick with pinning... :(
Actually, there, I think I agree with you, unfortunately. I think we're
stuck with it due to ill behavior in some distros and their defaults, in
particular with regards to systemd based distros. We need to do things
in a way that do not break on a distro running the host and in a way
that doesn't allow an arbitrary distro running in a container to
propagate random acts of terrorism to the host or other containers. But
that's probably a good paradigm for us, anyways.
> -- Christian
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | mhw at WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130924/db05d704/attachment.pgp>
More information about the lxc-devel
mailing list