[Lxc-users] lxc-start leaves temporary pivot dir behind

Mon May 10 19:43:44 UTC 2010

On 5/10/2010 10:48 AM, Daniel Lezcano wrote:
> Ferenc Wagner wrote:
>    
>> Daniel Lezcano<daniel.lezcano at free.fr>  writes:
>>
>>
>>      
>>> Ferenc Wagner wrote:
>>>
>>>
>>>        
>>>> Ferenc Wagner<wferi at niif.hu>  writes:
>>>>
>>>>
>>>>          
>>>>> Daniel Lezcano<dlezcano at fr.ibm.com>  writes:
>>>>>
>>>>>
>>>>>            
>>>>>> Ferenc Wagner wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Daniel Lezcano<daniel.lezcano at free.fr>  writes:
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> Ferenc Wagner wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> While playing with lxc-start, I noticed that /tmp is infested by
>>>>>>>>> empty lxc-r* directories: [...] Ok, this name comes from lxc-rootfs
>>>>>>>>> in conf.c:setup_rootfs.  After setup_rootfs_pivot_root returns, the
>>>>>>>>> original /tmp is not available anymore, so rmdir(tmpname) at the
>>>>>>>>> bottom of setup_rootfs can't achieve much.  Why is this temporary
>>>>>>>>> name needed anyway?  Is pivoting impossible without it?
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>
>>>>>>>> That was put in place with chroot, before pivot_root, so the distro's
>>>>>>>> scripts can remount their '/' without failing.
>>>>>>>>
>>>>>>>> Now we have pivot_root, I suppose we can change that to something cleaner...
>>>>>>>>
>>>>>>>>                  
>>>>>>>
>>>>>>> Like simply nuking it?  Shall I send a patch?
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>> Sure, if we can kill it, I will be glad to take your patch :)
>>>>>>
>>>>>>              
>>>>>
>>>>> I can't see any reason why lxc-start couldn't do without that temporary
>>>>> recursive bind mount of the original root.  If neither do you, I'll
>>>>> patch it out and see if it still flies.
>>>>>
>>>>>            
>>>> For my purposes the patch below works fine.  I only run applications,
>>>> though, not full systems, so wider testing is definitely needed.
>>>>
>>>> > From 98b24c13f809f18ab8969fb4d84defe6f812b25c Mon Sep 17 00:00:00 2001
>>>> From: Ferenc Wagner<wferi at niif.hu>
>>>> Date: Thu, 6 May 2010 14:47:39 +0200
>>>> Subject: [PATCH] no need to use a temporary directory for pivoting
>>>> [...]
>>>>
>>>>          
>>> We can't simply remove it because of the pivot_root which returns EBUSY.
>>> I suppose it's coming from: "new_root and put_old must not be on the
>>> same file system as the current root."
>>>
>>>        
>> Hmm, this could indeed be a problem if lxc.rootfs is on the current root
>> file system.  I didn't consider pivoting to the same FS, but looks like
>> this is the very reason for the current complexity in the architecture.
>>
>> Btw. is this really a safe thing to do, to pivot into a subdirectory of
>> a file system?  Is there really no way out of that?
>>
>>      
> It seems pivot_root on the same fs works if an intermediate mount point
> is inserted between old_root and new_root but at the cost of having a
> lazy unmount when we unmount the old rootfs filesystems . I didn't find
> a better solution in order to allow the rootfs to be a directory with a
> full files system tree.
>
> I am looking at making possible to specify a rootfs which is a file
> system image or a block device. I am not sure this should be done by lxc
> but looking forward ...
>
>    
>>> But as we will pivot_root right after, we won't reuse the real rootfs,
>>> so we can safely use the host /tmp.
>>>
>>>        
>> That will cause problems if rootfs is under /tmp, don't you think?
>>
>>      
> Right :)
>
>    
>> Actually, I'm not sure you can fully solve this.  If rootfs is a
>> separate file system, this is only much ado about nothing.  If rootfs
>> isn't a separate filesystem, you can't automatically find a good place
>> and also clean it up.
>>      
> Maybe a single /tmp/lxc directory may be used as the mount points are
> private to the container. So it would be acceptable to have a single
> directory for N containers, no ?
>
>    
>> So why not require that rootfs is a separate
>> filesystem, and let the user deal with it by doing the necessary bind
>> mount in the lxc config?
>>
>>      
> Hmm, that will break the actual user configurations.
>
> We can add a WARNING if rootfs is not a separate file system and provide
> the ability to let the user to do whatever he wants, IMO if it is well
> documented it is not a problem.
>    

Just putting in a hopefully unnecessary vote, if you are still deciding 
what's ultimately going to be possible or impossible:
As a user, I can say I really want to continue using a shared filesystem 
where the containrs roots are subdirectories on a single host filesystem.
The ability to use seperate filesystems or image files or real devices 
would be nice options, but the way I want to run most instances, is out 
of subdirectories.
I specifically deliberately want to allow any container to consume as 
much or as little space as it needs at any time without warning and at 
unpredictable rates, changing or spiking at unpredictable times.

I can describe all the reasons why I want that and why it's not "wrong" 
in my case but I'm assuming they are unnecessary and uninteresting.

Switching to bind mounts are ok. I don't mind if the details change 
about how to set up the config files and what steps the init scripts 
have to perform to launch a container, as long as it's still true that I 
don't have to provision fixed container sizes.

-- 
bkw