[lxc-devel] a container can remount ro the host's mount point

Michael H. Warfield mhw at WittsEnd.com
Mon Mar 15 22:42:20 UTC 2010


On Mon, 2010-03-15 at 22:19 +0100, Daniel Lezcano wrote: 
> Michael H. Warfield wrote:
> > On Mon, 2010-03-15 at 15:39 +0100, lxc at zitta.fr wrote: 
> >   
> >> Le 15/03/2010 15:05, Michael H. Warfield a écrit :
> >>     
> >>> On Sun, 2010-03-14 at 08:33 +0100, lxc at zitta.fr wrote:
> >>>   
> >>>       
> >>>> Hi,
> >>>>
> >>>> When I create a full os container (for example a debian), I have to
> >>>> remove init script that remount / read only on halt
> >>>> example : umountfs for lenny
> >>>>
> >>>> If I don't do this, the container remounts readonly the mount point
> >>>> where rootfs are when it stops.
> >>>>
> >>>> Why a container is able to do this?
> >>>> If you store multiples containers on the same mount point, it could be
> >>>> very problematic.
> >>>>     
> >>>>         
> >>> Ah HA!  So THAT'S the root cause of THAT problem.  Several of us have
> >>> noticed that effect.  Yeah, major PITA.  Also explains just why I no
> >>> longer see it.  Because of a practice I started using in setting up my
> >>> containers...
> >>>
> >>> As it so happens, because all of my containers are OpenVZ compatibility
> >>> containers, I use a bind mount in the fstab for the root fs.  OpenVZ has
> >>> this concept of a "private" and a "rootfs" to aid in setting disk quotas
> >>> in the container and I'm hoping to also eventually use that with union
> >>> mounts / unionfs to do a linux-vservers style unify.  But...  That also
> >>> prevents this problem because the container's rootfs is NOT a real fs in
> >>> the host, it's the bind mount and that insulates the hosts fs and mount
> >>> points from any actions in the container.
> >>>
> >>> Example from one of my containers is like this:
> >>>
> >>> Config:
> >>>
> >>> == 
> >>> lxc.rootfs = /srv/lxc/rootfs
> >>> lxc.mount = /srv/lxc/config/1004.fstab
> >>>   =
> >>>
> >>> fstab:
> >>>
> >>> == 
> >>> /srv/lxc/private/1004 /srv/lxc/rootfs    none bind 0 0
> >>>
> >>> /export               /srv/lxc/rootfs/export        none bind 0 0
> >>> /home/shared          /srv/lxc/rootfs/srv/shared    none bind 0 0
> >>> == 
> >>>
> >>> Would be really NICE if that bind could be something like a fuse with
> >>> unionfs or, eventually, a union mount once those are mature and stable
> >>> in the kernel, but we're not there yet.
> >>>
> >>> Now, you won't actually see anything in /srv/lxc/rootfs because it's
> >>> private to the container and it's just a dummy mount point that can be
> >>> used by all of your containers.  The only thing that varies between my
> >>> containers then is the location of the fstab (and the network stuff,
> >>> obviously).  The container can screw up its mounts all it want's their
> >>> ALL isolated and private to the container, including the rootfs.
> >>>
> >>>   
> >>>       
> >>>> Regards,
> >>>>     
> >>>>         
> >>>   
> >>>       
> >>>> Guillaume ZITTA
> >>>>     
> >>>>         
> >>> Regards,
> >>> Mike
> >>>   
> >>>       
> >> Thanks.
> >> I noticed that practice whas used by lxc-create in version 0.6.3
> >>     
> >
> > No, not exactly, and it wasn't being done by lxc-create.  lxc-create was
> > merely creating the directory, it was not doing the bind mount and could
> > not do the bind mount.  The actual mount was being done by lxc-start at
> > run time when starting that container.  The code in lxc-create was
> > removed because the behavior of lxc-start was changed to no longer
> > require that directory.
> >
> >   
> >> with lxc-0.6.3, lxc-create is a binary and it does this for you and
> >> other things in /var/lib/lxc
> >> with lxc-0.6.5, lxc-create is a shell script and it does less things
> >> than the binary one
> >>     
> >
> > Close but not quite.
> >
> >   
> >> Is this a voluntary regression?
> >>     
> >
> > It was a change (and Daniel may chime in here an correct me at any
> > moment) coupled with the introduction of using pivot root to actually
> > improve the isolation of the containers from the host and prevent the
> > containers from breaking out of their chrooted jails.  That was a
> > security fix.  He did drop that additional bind mount at that time and
> > yes that did provide the additional functional isolation in this one
> > peculiar way that protected the host from random acts of terrorism by
> > the container on its rootfs. 
> 
> >  An unanticipated side effect.
> >   
> Right, but lxc-start creates a temporary directory in /tmp called 
> lxc-XXXXXXX
> Then the rootfs is bind mounted to /tmp/lxc-XXXXXX and so pivot_root is 
> done in this directory.
> 
> What is the difference with what was doing the 0.6.3 version with a 
> previous bind mount in /var/lib/lxc/<name>/rootfs ?
> 
> 
> >> If not I propose myself to update lxc-create script to propose the same
> >> kind of functionality than the C version.
> >>     
> >
> > No.  Do not do that.  It did not work the way you're thinking it did and
> > that will not work.  It would create a situation where you would have to
> > rerun lxc-create after reboot or restarting because you will have lost
> > the bind mounts.  This never was done in lxc-create, only the creation
> > of the directory.  The mounting is done in lxc-start and must be done in
> > lxc-start.  Don't do this.  Personally, I like the method of adding the
> > bind mount explicitly to the fstab and plan to continue that way.  Maybe
> > we should merely make that be a "best practice".  That also gives us the
> > flexibility later down the road in adding disk quotas or different types
> > of file systems, other than a vanilla bind mount.  But all that's up to
> > Daniel.  
> I think most of the code is already there but not enabled. It should 
> allow to specify a disk image or a device as the rootfs. As these ones 
> can be mounted in empty rootfs directory, I thought it was not a big 
> priority and disabled the code in order to make some cleanup around. If 
> you are interested by this feature I can enable the code again.

Very interested.  Please do.

I know you've poured a lot of things into the source repository since
0.6.5 and I've been holding off on some of my scripts till that kicks a
release click since it also impacts things I'm doing like using
inotifywait for halt and restart and I want to see that come out and not
have to redo a bunch of my stuff.  I'm also really enthused (with some
trepidation) over the enter and exec stuff I'm seeing over in the
containers list.  I know a lot of people have that MUCH higher on their
priority list than some of my things like the dynamic devices and udev.

> > I'm sure he's now fully aware of this unintended consequence of
> > that change in lxc-start and he'll have to decide on the path moving
> > forward.  ITMT...  The bind mount is a successful and safe workaround
> > for the problem.  Don't go back to the old way of doing things here.

> Thanks for the analysis. I don't have a simple fix coming in my mind 
> right now, I will try to solve this problem from a larger perspective.

Well, now that we know what is causing it, we can deal with it and avoid
it with a decent workaround and warn others if it comes up.  You can
take your time and decide on how to fix it right.  Just like the pivot
root had some nasty surprises in it when it was released.  It pays off
to take some time on architectural decisions like that.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20100315/d919b976/attachment.pgp>


More information about the lxc-devel mailing list