[lxc-devel] [PATCH 1/5] cgroup: minor bugfixes so start and attach work again

Christian Seiler christian at iwakd.de
Wed Aug 21 11:49:41 UTC 2013


Hi Serge,

>> Having /lxc makes it much easier to sett what's part of a container 
>> vs
>> what's part of a user session or whatever else uses cgroups these 
>> days.
>
> Note that nothing stops you from simply entering cgroup /lxc by hand
> before executing a container.  Right now the code simply puts you 
> into
> a child of whichever directory you are in.  Advantage is slightly
> simpler code.

May I perhaps take a short step back to see why the change in the
current logic was added in the first place?
https://github.com/lxc/lxc/commit/b98f7d6ed1b89b6452af4a2b5e27d445e4b3a138

Basically, you want to be able to use nested containers but also
isolate the cgroup filesystem because it's not virtualized in the
kernel.

The current mountcgroups hook [1] (which btw. still assumes /lxc as a
prefix) bind-mounts
   host:/sys/fs/cgroup/$controller/lxc/$name
to
   container:/sys/fs/cgroup/$controller

[1] https://github.com/lxc/lxc/blob/staging/hooks/mountcgroups

But /proc/.../cgroup still contains the "/lxc/$name/$name" that is not
reachable from within the container. This breaks LXC from within such
a container, so that's why you implemented the checks to look for
parent cgroup directories until you find your own process.

If I think about that further, I think the initial bind-mount logic is
already borked. Because if nested LXC breaks in such a way, so will
many software that uses cgroups and relies on standard behaviour.

I think the correct way for the mountcgroups hook is to do the
following:

Suppose the container has the cgroup /lxc/foo/foo and we just have the
'cpu' controller available.

Initially, /sys/fs/cgroup will be a tmpfs and /sys/fs/cgroup/cpu will
contain the cpu controller.

LXC recursively creates /sys/fs/cgroup/cpu/lxc/foo. It then runs the
mountcgroups hook.

The mountcgroups hook should now mount a new tmpfs in
$containerroot/sys/fs/cgroup. It should then create the directories
for the controllers but *also* subdirectories for the cgroup of the
containers, i.e.

mount -t tmpfs none $containerroot/sys/fs/cgroup
mkdir -p $containerroot/sys/fs/cgroup/cpu/lxc/foo
mount -n --bind /sys/fs/cgroup/cpu/lxc/foo \
        $containerroot/sys/fs/cgroup/cpu/lxc/foo

That way, the following will work:

cgpath=$(grep $controller /proc/self/cgroup | cut -d: -f3)
ls /sys/fs/cgroup/$controller$cgpath

(Note that grepping for cpu will also find cpuset and cpuacct, so
don't take this example too literally. ;))

This ensures that also other software will not break because of
this.

On the other hand, LXC itself (not the mount hook) doesn't really look
for /sys/fs/cgroup, it goes through all the mountpoints where a cgroup
filesystem has been mounted. This would now find
/sys/fs/cgroup/cpu/lxc/foo inside the container and not the expected
/sys/fs/cgroup/cpu. But even here, we don't have to guess where the
cgroup lies, because /proc/self/mountinfo actually gives us the
information necessary to discern that - the 4th field of the file
contains the path within the filesystem at that mountpoint.

So outside the container you'd have a /proc/self/mountinfo entry like

25 22 0:20 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime 
- cgroup cgroup rw,cpuacct,cpu,clone_children

And inside the container it'd look like

48 20 0:20 /lxc/foo /sys/fs/cgroup/cpu/lxc/foo 
rw,nosuid,nodev,noexec,relatime - cgroup cgroup 
rw,cpuacct,cpu,clone_children

So if you look for the cgroup /lxc/foo/bar from inside the container,
then you can see that from the mount entry you just have to remove the
/lxc/foo (4th entry) before pasting it together with the mount point
/sys/fs/cgroup/cpu/lxc/foo (5th entry).

If you agree that both things would be worthwhile, I'd be willing to
write patches both for the mountcgroups hook and also for the LXC code
itself.

-- Christian

PS: Also, please see my other email as to whether we perhaps should
follow in the footsteps of other people and use /machine/foo.lxc
instead of /lxc/foo. (But that's orthogonal to this.)





More information about the lxc-devel mailing list