[lxc-devel] Failed to remove cgroup when lxc-start failed

Serge Hallyn serge.hallyn at ubuntu.com
Wed May 29 13:47:00 UTC 2013


Quoting Qiang Huang (h.huangqiang at huawei.com):
> On 2013/5/24 20:49, Serge Hallyn wrote:
> > Quoting Qiang Huang (h.huangqiang at huawei.com):
> >> Hi,
> >>
> >> I found a tricky problem in LXC, once I made a mistake in config, set
> >>
> >> lxc.cgroup.cpuset.cpus = -1
> >>
> >> ofcourse start would fail, but then "lxc-ls --active" showed the container
> >> is active.
> >>
> >> error message is:
> >> # lxc-start -n hq111 -f config_hq -l TRACE
> >> lxc-start: Invalid argument - write /cgroup/lxc/hq111/cpuset.cpus : Invalid argument
> >> lxc-start: Error setting cpuset.cpus to -1 for lxc/hq111
> >>
> >> lxc-start: failed to setup the cgroups for 'hq111'
> >> lxc-start: failed to spawn 'hq111'
> >> lxc-start: Device or resource busy - failed to remove cgroup '/cgroup/lxc/hq111'
> >>
> >>
> >> This is not hard to reproduce, just keep trying, not stable though.
> >> Then I read through the code and figured recursive_rmdir() failed, rmdir() return
> >> -1 sometimes, any idea how to fix this?
> > 
> > Could you tell us exactly which version this is, and exactly how you
> > created the container?  When I do it in ubuntu saucy (roughly 0.9.0 lxc),
> > the cgroup gets correctly removed.
> > 
> 
> Hi Serge,
> 
> I think I have found the reason, when setup_cgroup() fail, the child process
> may still exist when the father try to destroy cgroup.(We have no sync mechanism
> to ensure child can exit before father when something wrong happen)
> 
> commit 6031a6e5f939bda07d98768d34dafae677a7dfeb
> Author: Dwight Engen <dwight.engen at oracle.com>
> Date:   Wed May 15 12:27:34 2013 -0400
> 
>     set non device cgroup items before the cgroup is entered
> 
>     This allows some special cgroup items such as memory.kmem.limit_in_bytes
>     to be successfully set, since they must be set before any task is put
>     into the cgroup.
> 
>     The devices cgroup is setup later giving the container a chance to mount
>     file systems before the device it might want to mount from becomes
>     unavailable.
> 
>     Signed-off-by: Dwight Engen <dwight.engen at oracle.com>
>     Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
> 
> This patch moved setup_cgroup() before lxc_cgroup_enter(), when setup_cgroup()
> fail, there is no task in cgroup, so remove cgroup wouldn't fail.
> 
> So my problem no longer exists on the latest code, but there are still
> potential problems if we don't ensure child exit before father, such as
> Michael's problem, might also caused by this.

Right, so other failures later on *could* still cause this.
Shall we do something like

	{
		// Wait on any unterminated children
		int status, ret;
		while ((ret = waitpid(-1, &status, 0)) > 0);
	}

in lxc_abort() after the kill(handler->pid)?




More information about the lxc-devel mailing list