[lxc-users] Containers on linux-4.8-rc1 sometimes(?) requiring "cgmanager -m name=systemd" (bisected, but is it a bug?)

Eric W. Biederman ebiederm at xmission.com
Tue Sep 13 15:33:51 UTC 2016


Adam Richter <adamrichter4 at gmail.com> writes:

> On Linux 4.8-rc1 through 4-8-rc6 (latest rc), lxc fails start to
> Ubuntu 16.04 and Centos 7 containers [1], unless I first run
> "cgmanager -m name=systemd &" on the host, which, unlike the
> containers, was not running systemd or cgmanager.

Yes, that appears correct.  Given the current flat namespace of
hierarchies you fundamentally must coordinate with the host if you want
to use a new hierarchy.  So running cgmanager on the host seems like
the minimum way to do that.

If we truly need something more (which does not appear to be the case
here) the names of hierarchies need to be moved into a namespace.

> Git bisect revealed that this behavior began with a commit entitled
> "cgroupns: Only allow creation of hierarchies in the initial cgroup
> namespace" [2], which appears to be an attempt to protect against a
> possible denial of service attack.  Reversing the commit also restores
> successful commit the need to run that cgmanager process.  [Eric and
> Tejun, I have bcc'ed you so you can be aware of this discussion
> thread, as you apparently respectively wrote and approved the commit.]

As far as I can tell you were getting lucky and not having problems
before.

> Running that cgmanager invocation is pretty simple, and seems to me to
> be well worth closing a denial of service vulnerability, much as I
> dislike adding something systemd-specific to a non-systemd environment
> and adding a new dependency (lxc requires cgmanager on the host to
> run, I guess, any container that runs systemd).  However, I am posting
> this message because I don't fully understand the problem, and, most
> importantly, I am wondering if I have stumbled on an unintended
> consequence of this commit that might have other indicate other
> potential breakage.

I am surprised that your case worked but I don't think it amounts to an
unintended consequence.

> If this new lxc behavior is completely acceptable, then I apologize
> for consuming people's time with it and hope that this message will
> allow others experiencing the same problem find an answer for it when
> they search the web.

I will let the lxc-developers judge.

I don't think you hit a case that was expected to work.  Furthermore
either your containers were overprivileged or they would not have been
able to create subdirectories in the cgroup hierarchy.  So I expect this
change transformed a subtle breakage (aka one you had not noticed yet)
into an explicit breakage.

I am not subscribed to lxc-users so I don't know if anyone else has
replied to your post.  Cc's would have been better than Bcc's for
getting feedback in a situation like this.

Eric


> Adam Richter
>
>
> [1] Here is an example of failing to start one of these containers.
> $ sudo lxc-start --name ubuntu16.04_amd64 --foreground
> Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
> [!!!!!!] Failed to mount API filesystems, freezing.
> Freezing execution.
>
>
> [2] Here is the commit diff that triggers the new mishbehavior.
> commit 726a4994b05ff5b6f83d64b5b43c3251217366ce
> Author: Eric W. Biederman <ebiederm at xmission.com>
> Date:   Fri Jul 15 06:36:44 2016 -0500
>
>     cgroupns: Only allow creation of hierarchies in the initial cgroup namespace
>
>     Unprivileged users can't use hierarchies if they create them as they do not
>     have privilieges to the root directory.
>
>     Which means the only thing a hiearchy created by an unprivileged user
>     is good for is expanding the number of cgroup links in every css_set,
>     which is a DOS attack.
>
>     We could allow hierarchies to be created in namespaces in the initial
>     user namespace.  Unfortunately there is only a single namespace for
>     the names of heirarchies, so that is likely to create more confusion
>     than not.
>
>     So do the simple thing and restrict hiearchy creation to the initial
>     cgroup namespace.
>
>     Cc: stable at vger.kernel.org
>     Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
>     Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
>     Signed-off-by: Tejun Heo <tj at kernel.org>
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index e75efa8..e0be49f 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2215,12 +2215,8 @@ static struct dentry *cgroup_mount(struct
> file_system_type *fs_type,
>                 goto out_unlock;
>         }
>
> -       /*
> -        * We know this subsystem has not yet been bound.  Users in a non-init
> -        * user namespace may only mount hierarchies with no bound subsystems,
> -        * i.e. 'none,name=user1'
> -        */
> -       if (!opts.none && !capable(CAP_SYS_ADMIN)) {
> +       /* Hierarchies may only be created in the initial cgroup namespace. */
> +       if (ns != &init_cgroup_ns) {
>                 ret = -EPERM;
>                 goto out_unlock;
>         }


More information about the lxc-users mailing list