[lxc-devel] a container can remount ro the host's mount point

Michael H. Warfield mhw at WittsEnd.com
Mon Mar 15 17:28:10 UTC 2010


On Mon, 2010-03-15 at 15:39 +0100, lxc at zitta.fr wrote: 
> Le 15/03/2010 15:05, Michael H. Warfield a écrit :
> > On Sun, 2010-03-14 at 08:33 +0100, lxc at zitta.fr wrote:
> >   
> >> Hi,
> >>
> >> When I create a full os container (for example a debian), I have to
> >> remove init script that remount / read only on halt
> >> example : umountfs for lenny
> >>
> >> If I don't do this, the container remounts readonly the mount point
> >> where rootfs are when it stops.
> >>
> >> Why a container is able to do this?
> >> If you store multiples containers on the same mount point, it could be
> >> very problematic.
> >>     
> > Ah HA!  So THAT'S the root cause of THAT problem.  Several of us have
> > noticed that effect.  Yeah, major PITA.  Also explains just why I no
> > longer see it.  Because of a practice I started using in setting up my
> > containers...
> >
> > As it so happens, because all of my containers are OpenVZ compatibility
> > containers, I use a bind mount in the fstab for the root fs.  OpenVZ has
> > this concept of a "private" and a "rootfs" to aid in setting disk quotas
> > in the container and I'm hoping to also eventually use that with union
> > mounts / unionfs to do a linux-vservers style unify.  But...  That also
> > prevents this problem because the container's rootfs is NOT a real fs in
> > the host, it's the bind mount and that insulates the hosts fs and mount
> > points from any actions in the container.
> >
> > Example from one of my containers is like this:
> >
> > Config:
> >
> > == 
> > lxc.rootfs = /srv/lxc/rootfs
> > lxc.mount = /srv/lxc/config/1004.fstab
> >   =
> >
> > fstab:
> >
> > == 
> > /srv/lxc/private/1004 /srv/lxc/rootfs    none bind 0 0
> >
> > /export               /srv/lxc/rootfs/export        none bind 0 0
> > /home/shared          /srv/lxc/rootfs/srv/shared    none bind 0 0
> > == 
> >
> > Would be really NICE if that bind could be something like a fuse with
> > unionfs or, eventually, a union mount once those are mature and stable
> > in the kernel, but we're not there yet.
> >
> > Now, you won't actually see anything in /srv/lxc/rootfs because it's
> > private to the container and it's just a dummy mount point that can be
> > used by all of your containers.  The only thing that varies between my
> > containers then is the location of the fstab (and the network stuff,
> > obviously).  The container can screw up its mounts all it want's their
> > ALL isolated and private to the container, including the rootfs.
> >
> >   
> >> Regards,
> >>     
> >   
> >> Guillaume ZITTA
> >>     
> > Regards,
> > Mike
> >   
> Thanks.
> I noticed that practice whas used by lxc-create in version 0.6.3

No, not exactly, and it wasn't being done by lxc-create.  lxc-create was
merely creating the directory, it was not doing the bind mount and could
not do the bind mount.  The actual mount was being done by lxc-start at
run time when starting that container.  The code in lxc-create was
removed because the behavior of lxc-start was changed to no longer
require that directory.

> with lxc-0.6.3, lxc-create is a binary and it does this for you and
> other things in /var/lib/lxc
> with lxc-0.6.5, lxc-create is a shell script and it does less things
> than the binary one

Close but not quite.

> Is this a voluntary regression?

It was a change (and Daniel may chime in here an correct me at any
moment) coupled with the introduction of using pivot root to actually
improve the isolation of the containers from the host and prevent the
containers from breaking out of their chrooted jails.  That was a
security fix.  He did drop that additional bind mount at that time and
yes that did provide the additional functional isolation in this one
peculiar way that protected the host from random acts of terrorism by
the container on its rootfs.  An unanticipated side effect.

> If not I propose myself to update lxc-create script to propose the same
> kind of functionality than the C version.

No.  Do not do that.  It did not work the way you're thinking it did and
that will not work.  It would create a situation where you would have to
rerun lxc-create after reboot or restarting because you will have lost
the bind mounts.  This never was done in lxc-create, only the creation
of the directory.  The mounting is done in lxc-start and must be done in
lxc-start.  Don't do this.  Personally, I like the method of adding the
bind mount explicitly to the fstab and plan to continue that way.  Maybe
we should merely make that be a "best practice".  That also gives us the
flexibility later down the road in adding disk quotas or different types
of file systems, other than a vanilla bind mount.  But all that's up to
Daniel.  I'm sure he's now fully aware of this unintended consequence of
that change in lxc-start and he'll have to decide on the path moving
forward.  ITMT...  The bind mount is a successful and safe workaround
for the problem.  Don't go back to the old way of doing things here.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20100315/f5f68bb1/attachment.pgp>


More information about the lxc-devel mailing list