[lxc-devel] [PATCH 1/5] cgroup: minor bugfixes so start and attach work again
Christian Seiler
christian at iwakd.de
Wed Aug 21 11:49:41 UTC 2013
Hi Serge,
>> Having /lxc makes it much easier to sett what's part of a container
>> vs
>> what's part of a user session or whatever else uses cgroups these
>> days.
>
> Note that nothing stops you from simply entering cgroup /lxc by hand
> before executing a container. Right now the code simply puts you
> into
> a child of whichever directory you are in. Advantage is slightly
> simpler code.
May I perhaps take a short step back to see why the change in the
current logic was added in the first place?
https://github.com/lxc/lxc/commit/b98f7d6ed1b89b6452af4a2b5e27d445e4b3a138
Basically, you want to be able to use nested containers but also
isolate the cgroup filesystem because it's not virtualized in the
kernel.
The current mountcgroups hook [1] (which btw. still assumes /lxc as a
prefix) bind-mounts
host:/sys/fs/cgroup/$controller/lxc/$name
to
container:/sys/fs/cgroup/$controller
[1] https://github.com/lxc/lxc/blob/staging/hooks/mountcgroups
But /proc/.../cgroup still contains the "/lxc/$name/$name" that is not
reachable from within the container. This breaks LXC from within such
a container, so that's why you implemented the checks to look for
parent cgroup directories until you find your own process.
If I think about that further, I think the initial bind-mount logic is
already borked. Because if nested LXC breaks in such a way, so will
many software that uses cgroups and relies on standard behaviour.
I think the correct way for the mountcgroups hook is to do the
following:
Suppose the container has the cgroup /lxc/foo/foo and we just have the
'cpu' controller available.
Initially, /sys/fs/cgroup will be a tmpfs and /sys/fs/cgroup/cpu will
contain the cpu controller.
LXC recursively creates /sys/fs/cgroup/cpu/lxc/foo. It then runs the
mountcgroups hook.
The mountcgroups hook should now mount a new tmpfs in
$containerroot/sys/fs/cgroup. It should then create the directories
for the controllers but *also* subdirectories for the cgroup of the
containers, i.e.
mount -t tmpfs none $containerroot/sys/fs/cgroup
mkdir -p $containerroot/sys/fs/cgroup/cpu/lxc/foo
mount -n --bind /sys/fs/cgroup/cpu/lxc/foo \
$containerroot/sys/fs/cgroup/cpu/lxc/foo
That way, the following will work:
cgpath=$(grep $controller /proc/self/cgroup | cut -d: -f3)
ls /sys/fs/cgroup/$controller$cgpath
(Note that grepping for cpu will also find cpuset and cpuacct, so
don't take this example too literally. ;))
This ensures that also other software will not break because of
this.
On the other hand, LXC itself (not the mount hook) doesn't really look
for /sys/fs/cgroup, it goes through all the mountpoints where a cgroup
filesystem has been mounted. This would now find
/sys/fs/cgroup/cpu/lxc/foo inside the container and not the expected
/sys/fs/cgroup/cpu. But even here, we don't have to guess where the
cgroup lies, because /proc/self/mountinfo actually gives us the
information necessary to discern that - the 4th field of the file
contains the path within the filesystem at that mountpoint.
So outside the container you'd have a /proc/self/mountinfo entry like
25 22 0:20 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime
- cgroup cgroup rw,cpuacct,cpu,clone_children
And inside the container it'd look like
48 20 0:20 /lxc/foo /sys/fs/cgroup/cpu/lxc/foo
rw,nosuid,nodev,noexec,relatime - cgroup cgroup
rw,cpuacct,cpu,clone_children
So if you look for the cgroup /lxc/foo/bar from inside the container,
then you can see that from the mount entry you just have to remove the
/lxc/foo (4th entry) before pasting it together with the mount point
/sys/fs/cgroup/cpu/lxc/foo (5th entry).
If you agree that both things would be worthwhile, I'd be willing to
write patches both for the mountcgroups hook and also for the LXC code
itself.
-- Christian
PS: Also, please see my other email as to whether we perhaps should
follow in the footsteps of other people and use /machine/foo.lxc
instead of /lxc/foo. (But that's orthogonal to this.)
More information about the lxc-devel
mailing list