[lxc-devel] a container can remount ro the host's mount point

Thu Apr 1 04:42:26 UTC 2010

Daniel,

I'm going to top post here because I've just discovered that we've got a
bigger problem here, related to this whole mess.  A much bigger problem
having to do with bind mounts in general.

This is the generalized case here, which results from the observation
that, if a host container sets its root directory to ro, then the mount
point for the container in the host is set to ro.

In fact, this is true of any additional bind mounts in containers!

Say I have (and I do have) a couple of partitions which are shared
between certain containers, say for common data (somewhat risky, but I
eventually want to / hope to make them ro anyways).  I was investigating
the whole read-only bind mount morass when I encountered this...

So in the host, I have a partition, say /export, and I bind mount that
into the containers as /export in their space.  Maybe I would like to
eventually have this as ro in some of them, maybe not.  IAC, if I do a
remount in any of the containers, the changes are propagated outside of
the container to the host and to all the other containers!  So if I do a
"mount -o remount,ro /export" in container A, the host and all the other
containers now have /export as ro as well.  There's all kinds of concern
there, beyond merely the potential for mayhem by some practical joker in
one container.  What if I had some of these mounted ro (with the
appropriate patch that was mentioned in another thread, I know you can't
do it yet in the released code).  Can one container accidentally remount
the other containers rw?  Yuck!  What's worse...  If I set that mount ro
in the host, I damn well don't want the container to be about to remount
it rw merely by doing a remount (that may be another can of worms).

Just some thoughts, but this seems to be a mess and may even require
some kernel work with those bind mounts to fix.  This was tested on a
2.6.32 kernel.

Regards,
Mike

On Mon, 2010-03-15 at 22:19 +0100, Daniel Lezcano wrote: 
> Michael H. Warfield wrote:
> > On Mon, 2010-03-15 at 15:39 +0100, lxc at zitta.fr wrote: 
> >   
> >> Le 15/03/2010 15:05, Michael H. Warfield a écrit :
> >>     
> >>> On Sun, 2010-03-14 at 08:33 +0100, lxc at zitta.fr wrote:
> >>>   
> >>>       
> >>>> Hi,
> >>>>
> >>>> When I create a full os container (for example a debian), I have to
> >>>> remove init script that remount / read only on halt
> >>>> example : umountfs for lenny
> >>>>
> >>>> If I don't do this, the container remounts readonly the mount point
> >>>> where rootfs are when it stops.
> >>>>
> >>>> Why a container is able to do this?
> >>>> If you store multiples containers on the same mount point, it could be
> >>>> very problematic.
> >>>>     
> >>>>         
> >>> Ah HA!  So THAT'S the root cause of THAT problem.  Several of us have
> >>> noticed that effect.  Yeah, major PITA.  Also explains just why I no
> >>> longer see it.  Because of a practice I started using in setting up my
> >>> containers...
> >>>
> >>> As it so happens, because all of my containers are OpenVZ compatibility
> >>> containers, I use a bind mount in the fstab for the root fs.  OpenVZ has
> >>> this concept of a "private" and a "rootfs" to aid in setting disk quotas
> >>> in the container and I'm hoping to also eventually use that with union
> >>> mounts / unionfs to do a linux-vservers style unify.  But...  That also
> >>> prevents this problem because the container's rootfs is NOT a real fs in
> >>> the host, it's the bind mount and that insulates the hosts fs and mount
> >>> points from any actions in the container.
> >>>
> >>> Example from one of my containers is like this:
> >>>
> >>> Config:
> >>>
> >>> == 
> >>> lxc.rootfs = /srv/lxc/rootfs
> >>> lxc.mount = /srv/lxc/config/1004.fstab
> >>>   =
> >>>
> >>> fstab:
> >>>
> >>> == 
> >>> /srv/lxc/private/1004 /srv/lxc/rootfs    none bind 0 0
> >>>
> >>> /export               /srv/lxc/rootfs/export        none bind 0 0
> >>> /home/shared          /srv/lxc/rootfs/srv/shared    none bind 0 0
> >>> == 
> >>>
> >>> Would be really NICE if that bind could be something like a fuse with
> >>> unionfs or, eventually, a union mount once those are mature and stable
> >>> in the kernel, but we're not there yet.
> >>>
> >>> Now, you won't actually see anything in /srv/lxc/rootfs because it's
> >>> private to the container and it's just a dummy mount point that can be
> >>> used by all of your containers.  The only thing that varies between my
> >>> containers then is the location of the fstab (and the network stuff,
> >>> obviously).  The container can screw up its mounts all it want's their
> >>> ALL isolated and private to the container, including the rootfs.
> >>>
> >>>   
> >>>       
> >>>> Regards,
> >>>>     
> >>>>         
> >>>   
> >>>       
> >>>> Guillaume ZITTA
> >>>>     
> >>>>         
> >>> Regards,
> >>> Mike
> >>>   
> >>>       
> >> Thanks.
> >> I noticed that practice whas used by lxc-create in version 0.6.3
> >>     
> >
> > No, not exactly, and it wasn't being done by lxc-create.  lxc-create was
> > merely creating the directory, it was not doing the bind mount and could
> > not do the bind mount.  The actual mount was being done by lxc-start at
> > run time when starting that container.  The code in lxc-create was
> > removed because the behavior of lxc-start was changed to no longer
> > require that directory.
> >
> >   
> >> with lxc-0.6.3, lxc-create is a binary and it does this for you and
> >> other things in /var/lib/lxc
> >> with lxc-0.6.5, lxc-create is a shell script and it does less things
> >> than the binary one
> >>     
> >
> > Close but not quite.
> >
> >   
> >> Is this a voluntary regression?
> >>     
> >
> > It was a change (and Daniel may chime in here an correct me at any
> > moment) coupled with the introduction of using pivot root to actually
> > improve the isolation of the containers from the host and prevent the
> > containers from breaking out of their chrooted jails.  That was a
> > security fix.  He did drop that additional bind mount at that time and
> > yes that did provide the additional functional isolation in this one
> > peculiar way that protected the host from random acts of terrorism by
> > the container on its rootfs. 
> 
> >  An unanticipated side effect.
> >   
> Right, but lxc-start creates a temporary directory in /tmp called 
> lxc-XXXXXXX
> Then the rootfs is bind mounted to /tmp/lxc-XXXXXX and so pivot_root is 
> done in this directory.
> 
> What is the difference with what was doing the 0.6.3 version with a 
> previous bind mount in /var/lib/lxc/<name>/rootfs ?
> 
> 
> >> If not I propose myself to update lxc-create script to propose the same
> >> kind of functionality than the C version.
> >>     
> >
> > No.  Do not do that.  It did not work the way you're thinking it did and
> > that will not work.  It would create a situation where you would have to
> > rerun lxc-create after reboot or restarting because you will have lost
> > the bind mounts.  This never was done in lxc-create, only the creation
> > of the directory.  The mounting is done in lxc-start and must be done in
> > lxc-start.  Don't do this.  Personally, I like the method of adding the
> > bind mount explicitly to the fstab and plan to continue that way.  Maybe
> > we should merely make that be a "best practice".  That also gives us the
> > flexibility later down the road in adding disk quotas or different types
> > of file systems, other than a vanilla bind mount.  But all that's up to
> > Daniel.  
> I think most of the code is already there but not enabled. It should 
> allow to specify a disk image or a device as the rootfs. As these ones 
> can be mounted in empty rootfs directory, I thought it was not a big 
> priority and disabled the code in order to make some cleanup around. If 
> you are interested by this feature I can enable the code again.
> 
> > I'm sure he's now fully aware of this unintended consequence of
> > that change in lxc-start and he'll have to decide on the path moving
> > forward.  ITMT...  The bind mount is a successful and safe workaround
> > for the problem.  Don't go back to the old way of doing things here.
> >   
> Thanks for the analysis. I don't have a simple fix coming in my mind 
> right now, I will try to solve this problem from a larger perspective.
> 
> 
> 
> ------------------------------------------------------------------------------
> Download Intel® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Lxc-devel mailing list
> Lxc-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-devel

-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20100401/0e8f3c90/attachment.pgp>