[lxc-devel] a container can remount ro the host's mount point

Mon Mar 15 17:52:53 UTC 2010

Le 15/03/2010 18:28, Michael H. Warfield a écrit :
> On Mon, 2010-03-15 at 15:39 +0100, lxc at zitta.fr wrote: 
>   
>> Le 15/03/2010 15:05, Michael H. Warfield a écrit :
>>     
>>> On Sun, 2010-03-14 at 08:33 +0100, lxc at zitta.fr wrote:
>>>   
>>>       
>>>> Hi,
>>>>
>>>> When I create a full os container (for example a debian), I have to
>>>> remove init script that remount / read only on halt
>>>> example : umountfs for lenny
>>>>
>>>> If I don't do this, the container remounts readonly the mount point
>>>> where rootfs are when it stops.
>>>>
>>>> Why a container is able to do this?
>>>> If you store multiples containers on the same mount point, it could be
>>>> very problematic.
>>>>     
>>>>         
>>> Ah HA!  So THAT'S the root cause of THAT problem.  Several of us have
>>> noticed that effect.  Yeah, major PITA.  Also explains just why I no
>>> longer see it.  Because of a practice I started using in setting up my
>>> containers...
>>>
>>> As it so happens, because all of my containers are OpenVZ compatibility
>>> containers, I use a bind mount in the fstab for the root fs.  OpenVZ has
>>> this concept of a "private" and a "rootfs" to aid in setting disk quotas
>>> in the container and I'm hoping to also eventually use that with union
>>> mounts / unionfs to do a linux-vservers style unify.  But...  That also
>>> prevents this problem because the container's rootfs is NOT a real fs in
>>> the host, it's the bind mount and that insulates the hosts fs and mount
>>> points from any actions in the container.
>>>
>>> Example from one of my containers is like this:
>>>
>>> Config:
>>>
>>> == 
>>> lxc.rootfs = /srv/lxc/rootfs
>>> lxc.mount = /srv/lxc/config/1004.fstab
>>>   =
>>>
>>> fstab:
>>>
>>> == 
>>> /srv/lxc/private/1004 /srv/lxc/rootfs    none bind 0 0
>>>
>>> /export               /srv/lxc/rootfs/export        none bind 0 0
>>> /home/shared          /srv/lxc/rootfs/srv/shared    none bind 0 0
>>> == 
>>>
>>> Would be really NICE if that bind could be something like a fuse with
>>> unionfs or, eventually, a union mount once those are mature and stable
>>> in the kernel, but we're not there yet.
>>>
>>> Now, you won't actually see anything in /srv/lxc/rootfs because it's
>>> private to the container and it's just a dummy mount point that can be
>>> used by all of your containers.  The only thing that varies between my
>>> containers then is the location of the fstab (and the network stuff,
>>> obviously).  The container can screw up its mounts all it want's their
>>> ALL isolated and private to the container, including the rootfs.
>>>
>>>   
>>>       
>>>> Regards,
>>>>     
>>>>         
>>>   
>>>       
>>>> Guillaume ZITTA
>>>>     
>>>>         
>>> Regards,
>>> Mike
>>>   
>>>       
>> Thanks.
>> I noticed that practice whas used by lxc-create in version 0.6.3
>>     
> No, not exactly, and it wasn't being done by lxc-create.  lxc-create was
> merely creating the directory, it was not doing the bind mount and could
> not do the bind mount.  The actual mount was being done by lxc-start at
> run time when starting that container.  The code in lxc-create was
> removed because the behavior of lxc-start was changed to no longer
> require that directory.
>
>   
>> with lxc-0.6.3, lxc-create is a binary and it does this for you and
>> other things in /var/lib/lxc
>> with lxc-0.6.5, lxc-create is a shell script and it does less things
>> than the binary one
>>     
> Close but not quite.
>
>   
>> Is this a voluntary regression?
>>     
> It was a change (and Daniel may chime in here an correct me at any
> moment) coupled with the introduction of using pivot root to actually
> improve the isolation of the containers from the host and prevent the
> containers from breaking out of their chrooted jails.  That was a
> security fix.  He did drop that additional bind mount at that time and
> yes that did provide the additional functional isolation in this one
> peculiar way that protected the host from random acts of terrorism by
> the container on its rootfs.  An unanticipated side effect.
>
>   
>> If not I propose myself to update lxc-create script to propose the same
>> kind of functionality than the C version.
>>     
> No.  Do not do that.  It did not work the way you're thinking it did and
> that will not work.  It would create a situation where you would have to
> rerun lxc-create after reboot or restarting because you will have lost
> the bind mounts.  This never was done in lxc-create, only the creation
> of the directory.  The mounting is done in lxc-start and must be done in
> lxc-start.  Don't do this.  Personally, I like the method of adding the
> bind mount explicitly to the fstab and plan to continue that way.  Maybe
> we should merely make that be a "best practice".  That also gives us the
> flexibility later down the road in adding disk quotas or different types
> of file systems, other than a vanilla bind mount.  But all that's up to
> Daniel.  I'm sure he's now fully aware of this unintended consequence of
> that change in lxc-start and he'll have to decide on the path moving
> forward.  ITMT...  The bind mount is a successful and safe workaround
> for the problem.  Don't go back to the old way of doing things here.
>
> Regards,
>   
> Mike

Ok, I'll modify my container creation script to populate the fstab with
a mount bind.

Thanks for the explanation.

Regards,
Guillaume