[lxc-devel] Failed to remove cgroup when lxc-start failed

Qiang Huang h.huangqiang at huawei.com
Wed May 29 11:06:18 UTC 2013


On 2013/5/24 20:49, Serge Hallyn wrote:
> Quoting Qiang Huang (h.huangqiang at huawei.com):
>> Hi,
>>
>> I found a tricky problem in LXC, once I made a mistake in config, set
>>
>> lxc.cgroup.cpuset.cpus = -1
>>
>> ofcourse start would fail, but then "lxc-ls --active" showed the container
>> is active.
>>
>> error message is:
>> # lxc-start -n hq111 -f config_hq -l TRACE
>> lxc-start: Invalid argument - write /cgroup/lxc/hq111/cpuset.cpus : Invalid argument
>> lxc-start: Error setting cpuset.cpus to -1 for lxc/hq111
>>
>> lxc-start: failed to setup the cgroups for 'hq111'
>> lxc-start: failed to spawn 'hq111'
>> lxc-start: Device or resource busy - failed to remove cgroup '/cgroup/lxc/hq111'
>>
>>
>> This is not hard to reproduce, just keep trying, not stable though.
>> Then I read through the code and figured recursive_rmdir() failed, rmdir() return
>> -1 sometimes, any idea how to fix this?
> 
> Could you tell us exactly which version this is, and exactly how you
> created the container?  When I do it in ubuntu saucy (roughly 0.9.0 lxc),
> the cgroup gets correctly removed.
> 

Hi Serge,

I think I have found the reason, when setup_cgroup() fail, the child process
may still exist when the father try to destroy cgroup.(We have no sync mechanism
to ensure child can exit before father when something wrong happen)

commit 6031a6e5f939bda07d98768d34dafae677a7dfeb
Author: Dwight Engen <dwight.engen at oracle.com>
Date:   Wed May 15 12:27:34 2013 -0400

    set non device cgroup items before the cgroup is entered

    This allows some special cgroup items such as memory.kmem.limit_in_bytes
    to be successfully set, since they must be set before any task is put
    into the cgroup.

    The devices cgroup is setup later giving the container a chance to mount
    file systems before the device it might want to mount from becomes
    unavailable.

    Signed-off-by: Dwight Engen <dwight.engen at oracle.com>
    Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>

This patch moved setup_cgroup() before lxc_cgroup_enter(), when setup_cgroup()
fail, there is no task in cgroup, so remove cgroup wouldn't fail.

So my problem no longer exists on the latest code, but there are still
potential problems if we don't ensure child exit before father, such as
Michael's problem, might also caused by this.








More information about the lxc-devel mailing list