[lxc-devel] lxc-start leaves temporary pivot dir behind

Mon May 10 18:26:44 UTC 2010

Daniel Lezcano <daniel.lezcano at free.fr> writes:

> Ferenc Wagner wrote:
>
>> Daniel Lezcano <daniel.lezcano at free.fr> writes:
>>       
>>> We can't simply remove it because of the pivot_root which returns
>>> EBUSY.  I suppose it's coming from: "new_root and put_old must not
>>> be on the same file system as the current root."
>>
>> Hmm, this could indeed be a problem if lxc.rootfs is on the current root
>> file system.  I didn't consider pivoting to the same FS, but looks like
>> this is the very reason for the current complexity in the architecture.
>>
>> Btw. is this really a safe thing to do, to pivot into a subdirectory of
>> a file system?  Is there really no way out of that?
>
> It seems pivot_root on the same fs works if an intermediate mount
> point is inserted between old_root and new_root but at the cost of
> having a lazy unmount when we unmount the old rootfs filesystems.

After pivoting?  Could you please illustrate this?

> I am looking at making possible to specify a rootfs which is a file
> system image or a block device. I am not sure this should be done by
> lxc but looking forward ...

A device could be easily mounted by the user or by an lxc.mount.entry,
so I don't think it needs special consideration.

>>> But as we will pivot_root right after, we won't reuse the real
>>> rootfs, so we can safely use the host /tmp.
>>
>> That will cause problems if rootfs is under /tmp, don't you think?
>
> Right :)

Btw. my use case is exactly that: I mostly want to prune the namespace
of the container, so I bind mount / to /tmp/.../jail and a couple of
things (but not everything!) below that, and set rootfs=/tmp/.../jail.

>> Actually, I'm not sure you can fully solve this.  If rootfs is a
>> separate file system, this is only much ado about nothing.  If rootfs
>> isn't a separate filesystem, you can't automatically find a good
>> place and also clean it up.
>
> Maybe a single /tmp/lxc directory may be used as the mount points are
> private to the container. So it would be acceptable to have a single
> directory for N containers, no ?

Then why not /usr/lib/lxc/pivotdir or something like that?  Such a
directory could belong to the lxc package and not clutter up /tmp.  As
you pointed out, this directory would always be empty in the outer name
space, so a single one would suffice.  Thus there would be no need
cleaning it up, either.

>> So why not require that rootfs is a separate filesystem, and let the
>> user deal with it by doing the necessary bind mount in the lxc
>> config?
>   
> Hmm, that will break the actual user configurations.

Yes, sadly.

> We can add a WARNING if rootfs is not a separate file system and
> provide the ability to let the user to do whatever he wants, IMO if it
> is well documented it is not a problem.

Sure.  It adds some complexity to the code, but lxc is there to help
doing common tasks.  Now the question is: if rootfs is a separate file
system (which includes bind mounts), is the superfluous rbind of the
original root worth skipping, or should we just do it to avoid needing
an extra code path?
-- 
Thanks,
Feri.