[lxc-devel] [RFC] rootfs pinning

Tue Sep 24 14:45:16 UTC 2013

On Tue, Sep 24, 2013 at 09:41:04AM -0500, Serge Hallyn wrote:
> Quoting Rob Landley (rob at landley.net):
> > On 09/23/2013 11:19:17 AM, Serge Hallyn wrote:
> > >Quoting Rob Landley (rob at landley.net):
> > >> On 09/12/2013 01:27:07 PM, Christian Seiler wrote:
> > >> > Hi there,
> > >> >
> > >> > just a quick question: currently, rootfs is pinned with a
> > >.hold file
> > >> > in
> > >> > the parent directory (which btw. does not help against file
> > >systems
> > >> > that
> > >> > are already mounted on the host but directly in the rootfs
> > >directory).
> > >> > The problem with the .hold file is that it doesn't make the
> > >directory
> > >> > necessarily pretty; I tend to mount all rootfs to
> > >/srv/lxc/$container
> > >> > (config remaining in /var/lib/lxc), and then when doing a ls
> > >> > /srv/lxc, I
> > >> > see tons of .hold files. (I'm not even sure that they are removed
> > >> > after
> > >> > container termination - but even if they are, the default
> > >state of a
> > >> > typical system tends to be that at least some containers are
> > >> > running...)
> > >> >
> > >> > Couldn't we just open $rootfs/lxc.hold for writing, keep the
> > >fd (as
> > >> > current pinfd) and then unlink (!) the file directly? According to
> > >> > POSIX
> > >> > semantics, the file is then still open and the pinning should work
> > >> > (now
> > >> > also for the above case), but there are no files lying around
> > >anymore.
> > >> > (Note: I didn't test that, it could well be that that doesn't
> > >work.)
> > >> >
> > >> > Thoughts?
> > >>
> > >> Why doesn't keeping a file open to the directory itself work? (I'm
> > >> assuming it doesn't, I'm wondering why.)
> > >
> > >Tried it under tmpfs, and open("/mnt", O_RDWR) with tmpfs mounted
> > >at /mnt does not work, gives EISDIR.  O_RDONLY does work, but that
> > >doesn't prevent mount -o remount,ro.
> > 
> > The filesystem hitting an error (including one from the block
> > device) can make most filesystems remount themselves read only,
> > forcibly even with active writers. The permissions to do so from
> > userspace should be roughly analogous to calling shutdown or "kill
> > -1"? (I'm wondering what lxc's interest is in preventing the
> > container-local root from doing something container-local
> > dangerous?)
> 
> Some people have a block device mounted at /var/lib/lxc, and
> keep all their containers and rootfs' there.
> 
> If they start a single container and shut it down, most distros
> during shutdown will mount -o remount,ro /, which will end up
> remounting /var/lib/lxc ro.  Now other containers can't start up.
> So it's not actually container-local dangerous.
> 
> Now, it's possible that we should just make sure that any
> directory-backed (or btrfs-backed) containers always bind-mount
> $rootfs onto itself.  That might work and might be a cleaner
> solution.
> 
> -serge

Yep, we discussed this at Plumbers and I think it's really the way to
go, basically remove all of that fs pinning code and just do a
bind-mount of the rootfs on itself in the container's mountns before
starting it.

That way if the container decideds to remount / ro at any point, it'll
succeed and will give the user a read-only / but without affecting the
outside world.

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130924/ba50a032/attachment.pgp>