[lxc-devel] Containers do not start with lxc-1.0.0.beta2 on RHEL-6.5
Serge Hallyn
serge.hallyn at ubuntu.com
Fri Jan 17 22:21:49 UTC 2014
Quoting Robert Vogelgesang (vogel at users.sourceforge.net):
> Hello all,
>
> since yesterday I'm testing lxc-1.0.0.beta2 on a RHEL-6.5, but I
> failed to get any container to start.
>
> I've set up a RHEL-6.5 test server with the "cgconfig" service enabled
> in default configuration. When I try to start a container (with root
> privileges), I get:
>
> # lxc-start -n test -d -o lxc-start.log -l DEBUG
> lxc-start: command get_cgroup failed to receive response
>
> The container did not start, and lxc-start.log has the following ERRORs:
> (leading whitespace trimmed)
>
> lxc-start 1389968577.048 ERROR lxc_cgroup - Could not set clone_children to 1 for cpuset hierarchy in parent cgroup.
> lxc-start 1389968577.048 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/blkio/
> lxc-start 1389968577.048 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/net_cls/
> lxc-start 1389968577.048 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/freezer/
> lxc-start 1389968577.048 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/devices/
> lxc-start 1389968577.048 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/memory/
> lxc-start 1389968577.048 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/cpuacct/
> lxc-start 1389968577.049 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/cpu/
> lxc-start 1389968577.049 ERROR lxc_cgroup - Device or resource busy - cgroup_rmdir: failed to delete /cgroup/cpuset/
> lxc-start 1389968577.049 ERROR lxc_start - failed to create cgroups for 'test'
> lxc-start 1389968577.078 ERROR lxc_start - failed to spawn 'test'
> lxc-start 1389968577.079 ERROR lxc_commands - command get_cgroup failed to receive response
>
>
> The first error comes from cgroup.c:lxc_cgroup_create(). When comparing
> this with cgroup.c:set_clone_children() from lxc-0.9.0 I saw that 0.9.0
> ignored errors when setting clone_children, and so I patched
> cgroup.c:lxc_cgroup_create() to do the same. But the container still did
> not start. The errors in lxc-start.log were now:
>
> lxc-start 1389980209.248 ERROR lxc_cgroup - No space left on device - Could not add pid 21347 to cgroup /lxc/test: internal error
> lxc-start 1389980209.270 ERROR lxc_start - failed to spawn 'test'
> lxc-start 1389980209.271 ERROR lxc_commands - command get_cgroup failed to receive response
>
> Using strace(8), I found this:
> open("/cgroup/cpuset/lxc/test/tasks", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 11
> write(11, "21347", 5) = -1 ENOSPC (No space left on device)
> close(11) = 0
>
> Switching back to lxc-0.9.0, this same container starts just fine.
>
> 0.9.0 has its own set of problems when used under RHEL-6.5, but containers
> do at least start - and can be shut down again. I had hoped that 1.0.0
> would resolve the issues of 0.9.0... (lxc-ps and lxc-netstat don't work)
>
> So, my first question would be: Is RHEL-6.5 (and CentOS 6.5, and others)
> a "supported" platform for lxc-1.0.0?
>
> And if so: What should I do to debug this further? Are there already
> some patches I could test?
Hi,
I'm not sure what's going on with the order of things for you, but I
can explain the errors from the low level.
When you're not allowed to set clone_children, it is likely because
there are already other child cgroups. You cannot change clone_children
in that case.
When you get -ENOSPC it is because clone_children was not set, so the
cpuset.mems and cpuset.cpus files were not initialized in the
container's cgroups. (I've always hated this behavior).
The *surest* way to avoid problems is to set up an early init job for
yourself which sets clone_children to 1 in the root cpuset cgroup.
-serge
More information about the lxc-devel
mailing list