[lxc-devel] lxc-start leaves temporary pivot dir behind

Tue May 11 16:22:13 UTC 2010

Ferenc Wagner wrote:
> Daniel Lezcano <daniel.lezcano at free.fr> writes:
>
>   
>> Ferenc Wagner wrote:
>>
>>     
>>> Daniel Lezcano <daniel.lezcano at free.fr> writes:
>>>       
>>>       
>>>> We can't simply remove it because of the pivot_root which returns
>>>> EBUSY.  I suppose it's coming from: "new_root and put_old must not
>>>> be on the same file system as the current root."
>>>>         
>>> Hmm, this could indeed be a problem if lxc.rootfs is on the current root
>>> file system.  I didn't consider pivoting to the same FS, but looks like
>>> this is the very reason for the current complexity in the architecture.
>>>
>>> Btw. is this really a safe thing to do, to pivot into a subdirectory of
>>> a file system?  Is there really no way out of that?
>>>       
>> It seems pivot_root on the same fs works if an intermediate mount
>> point is inserted between old_root and new_root but at the cost of
>> having a lazy unmount when we unmount the old rootfs filesystems.
>>     
>
> After pivoting?  Could you please illustrate this?
>   

After the pivot_root syscall, we have oldroot and newroot.
oldroot is underneath newroot, so after pivot_root, we can still access 
/oldroot.

We want to umount the oldroot dir of course, but before umounting it, we 
have to umount all the subdirectories.
When everything is unmounted, we finish to umount /oldroot. But in some 
circumstances, this umount fails with EBUSY, so we "detach" the 
directory with the MNT_DETACH option.

http://sourceforge.net/mailarchive/message.php?msg_name=4B5B6DA5.6050302%40free.fr

>> I am looking at making possible to specify a rootfs which is a file
>> system image or a block device. I am not sure this should be done by
>> lxc but looking forward ...
>>     
>
> A device could be easily mounted by the user or by an lxc.mount.entry,
> so I don't think it needs special consideration.
>   

There is the file system automatic detection of the image if the image 
is specified in the mount entry.
I already coded that, but we can postpone that for the moment and focus 
on the pivot_root.

>>>> But as we will pivot_root right after, we won't reuse the real
>>>> rootfs, so we can safely use the host /tmp.
>>>>         
>>> That will cause problems if rootfs is under /tmp, don't you think?
>>>       
>> Right :)
>>     
>
> Btw. my use case is exactly that: I mostly want to prune the namespace
> of the container, so I bind mount / to /tmp/.../jail and a couple of
> things (but not everything!) below that, and set rootfs=/tmp/.../jail.
>   

Ok, will fix that.

>>> Actually, I'm not sure you can fully solve this.  If rootfs is a
>>> separate file system, this is only much ado about nothing.  If rootfs
>>> isn't a separate filesystem, you can't automatically find a good
>>> place and also clean it up.
>>>       
>> Maybe a single /tmp/lxc directory may be used as the mount points are
>> private to the container. So it would be acceptable to have a single
>> directory for N containers, no ?
>>     
>
> Then why not /usr/lib/lxc/pivotdir or something like that?  Such a
> directory could belong to the lxc package and not clutter up /tmp.  As
> you pointed out, this directory would always be empty in the outer name
> space, so a single one would suffice.  Thus there would be no need
> cleaning it up, either.
>   

Agree. Shall we consider $(prefix)/var/run/lxc ?

>>> So why not require that rootfs is a separate filesystem, and let the
>>> user deal with it by doing the necessary bind mount in the lxc
>>> config?
>>>       
>>   
>> Hmm, that will break the actual user configurations.
>>     
>
> Yes, sadly.
>
>   
>> We can add a WARNING if rootfs is not a separate file system and
>> provide the ability to let the user to do whatever he wants, IMO if it
>> is well documented it is not a problem.
>>     
>
> Sure.  It adds some complexity to the code, but lxc is there to help
> doing common tasks.  Now the question is: if rootfs is a separate file
> system (which includes bind mounts), is the superfluous rbind of the
> original root worth skipping, or should we just do it to avoid needing
> an extra code path?
>   
Good question. IMO, skipping the rbind is ok for this case but it may be 
interesting from a coding point of view to have a single place 
identified for the rootfs (especially for mounting an image). I will 
cook a patchset to fix the rootfs location and then we can look at 
removing the superfluous rbind.