[lxc-devel] [RFC] rootfs pinning

Michael H. Warfield mhw at WittsEnd.com
Tue Sep 24 19:10:22 UTC 2013


On Tue, 2013-09-24 at 10:45 -0400, Stéphane Graber wrote: 
> On Tue, Sep 24, 2013 at 09:41:04AM -0500, Serge Hallyn wrote:
> > Quoting Rob Landley (rob at landley.net):
> > > On 09/23/2013 11:19:17 AM, Serge Hallyn wrote:
> > > >Quoting Rob Landley (rob at landley.net):
> > > >> On 09/12/2013 01:27:07 PM, Christian Seiler wrote:
> > > >> > Hi there,
> > > >> >
> > > >> > just a quick question: currently, rootfs is pinned with a
> > > >.hold file
> > > >> > in
> > > >> > the parent directory (which btw. does not help against file
> > > >systems
> > > >> > that
> > > >> > are already mounted on the host but directly in the rootfs
> > > >directory).
> > > >> > The problem with the .hold file is that it doesn't make the
> > > >directory
> > > >> > necessarily pretty; I tend to mount all rootfs to
> > > >/srv/lxc/$container
> > > >> > (config remaining in /var/lib/lxc), and then when doing a ls
> > > >> > /srv/lxc, I
> > > >> > see tons of .hold files. (I'm not even sure that they are removed
> > > >> > after
> > > >> > container termination - but even if they are, the default
> > > >state of a
> > > >> > typical system tends to be that at least some containers are
> > > >> > running...)
> > > >> >
> > > >> > Couldn't we just open $rootfs/lxc.hold for writing, keep the
> > > >fd (as
> > > >> > current pinfd) and then unlink (!) the file directly? According to
> > > >> > POSIX
> > > >> > semantics, the file is then still open and the pinning should work
> > > >> > (now
> > > >> > also for the above case), but there are no files lying around
> > > >anymore.
> > > >> > (Note: I didn't test that, it could well be that that doesn't
> > > >work.)
> > > >> >
> > > >> > Thoughts?
> > > >>
> > > >> Why doesn't keeping a file open to the directory itself work? (I'm
> > > >> assuming it doesn't, I'm wondering why.)
> > > >
> > > >Tried it under tmpfs, and open("/mnt", O_RDWR) with tmpfs mounted
> > > >at /mnt does not work, gives EISDIR.  O_RDONLY does work, but that
> > > >doesn't prevent mount -o remount,ro.
> > > 
> > > The filesystem hitting an error (including one from the block
> > > device) can make most filesystems remount themselves read only,
> > > forcibly even with active writers. The permissions to do so from
> > > userspace should be roughly analogous to calling shutdown or "kill
> > > -1"? (I'm wondering what lxc's interest is in preventing the
> > > container-local root from doing something container-local
> > > dangerous?)
> > 
> > Some people have a block device mounted at /var/lib/lxc, and
> > keep all their containers and rootfs' there.
> > 
> > If they start a single container and shut it down, most distros
> > during shutdown will mount -o remount,ro /, which will end up
> > remounting /var/lib/lxc ro.  Now other containers can't start up.
> > So it's not actually container-local dangerous.
> > 
> > Now, it's possible that we should just make sure that any
> > directory-backed (or btrfs-backed) containers always bind-mount
> > $rootfs onto itself.  That might work and might be a cleaner
> > solution.
> > 
> > -serge

> Yep, we discussed this at Plumbers and I think it's really the way to
> go, basically remove all of that fs pinning code and just do a
> bind-mount of the rootfs on itself in the container's mountns before
> starting it.

> That way if the container decideds to remount / ro at any point, it'll
> succeed and will give the user a read-only / but without affecting the
> outside world.

Ideally, I think that's the way to go and I use to do that manually when
setting up my containers but I was thinking there was some breakage
between that and the way we were working around the pivot_root problem
introduced by systemd (Fedora, Suse, Arch, et al).  If we can verify
that works with all the init flavors without breaking, that could be
part of the general cleanup of the mount tables in the containers as
well, maybe...

> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60133471&iu=/4140/ostg.clktrk

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130924/13f3edbf/attachment.pgp>


More information about the lxc-devel mailing list