[lxc-users] Containers on linux-4.8-rc1 sometimes(?) requiring "cgmanager -m name=systemd" (bisected, but is it a bug?)
Serge E. Hallyn
serge at hallyn.com
Tue Sep 13 16:06:51 UTC 2016
Quoting Eric W. Biederman (ebiederm at xmission.com):
> Adam Richter <adamrichter4 at gmail.com> writes:
>
> > On Linux 4.8-rc1 through 4-8-rc6 (latest rc), lxc fails start to
> > Ubuntu 16.04 and Centos 7 containers [1], unless I first run
> > "cgmanager -m name=systemd &" on the host, which, unlike the
> > containers, was not running systemd or cgmanager.
>
> Yes, that appears correct. Given the current flat namespace of
> hierarchies you fundamentally must coordinate with the host if you want
> to use a new hierarchy. So running cgmanager on the host seems like
> the minimum way to do that.
>
> If we truly need something more (which does not appear to be the case
> here) the names of hierarchies need to be moved into a namespace.
>
> > Git bisect revealed that this behavior began with a commit entitled
> > "cgroupns: Only allow creation of hierarchies in the initial cgroup
> > namespace" [2], which appears to be an attempt to protect against a
> > possible denial of service attack. Reversing the commit also restores
> > successful commit the need to run that cgmanager process. [Eric and
> > Tejun, I have bcc'ed you so you can be aware of this discussion
> > thread, as you apparently respectively wrote and approved the commit.]
>
> As far as I can tell you were getting lucky and not having problems
> before.
>
> > Running that cgmanager invocation is pretty simple, and seems to me to
> > be well worth closing a denial of service vulnerability, much as I
> > dislike adding something systemd-specific to a non-systemd environment
> > and adding a new dependency (lxc requires cgmanager on the host to
> > run, I guess, any container that runs systemd). However, I am posting
> > this message because I don't fully understand the problem, and, most
> > importantly, I am wondering if I have stumbled on an unintended
> > consequence of this commit that might have other indicate other
> > potential breakage.
>
> I am surprised that your case worked but I don't think it amounts to an
> unintended consequence.
>
> > If this new lxc behavior is completely acceptable, then I apologize
> > for consuming people's time with it and hope that this message will
> > allow others experiencing the same problem find an answer for it when
> > they search the web.
>
> I will let the lxc-developers judge.
>
> I don't think you hit a case that was expected to work. Furthermore
fwiw indeed this was never expected to work.
> either your containers were overprivileged or they would not have been
> able to create subdirectories in the cgroup hierarchy. So I expect this
> change transformed a subtle breakage (aka one you had not noticed yet)
> into an explicit breakage.
>
> I am not subscribed to lxc-users so I don't know if anyone else has
> replied to your post. Cc's would have been better than Bcc's for
> getting feedback in a situation like this.
>
> Eric
>
>
> > Adam Richter
> >
> >
> > [1] Here is an example of failing to start one of these containers.
> > $ sudo lxc-start --name ubuntu16.04_amd64 --foreground
> > Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
> > [!!!!!!] Failed to mount API filesystems, freezing.
> > Freezing execution.
> >
> >
> > [2] Here is the commit diff that triggers the new mishbehavior.
> > commit 726a4994b05ff5b6f83d64b5b43c3251217366ce
> > Author: Eric W. Biederman <ebiederm at xmission.com>
> > Date: Fri Jul 15 06:36:44 2016 -0500
> >
> > cgroupns: Only allow creation of hierarchies in the initial cgroup namespace
> >
> > Unprivileged users can't use hierarchies if they create them as they do not
> > have privilieges to the root directory.
> >
> > Which means the only thing a hiearchy created by an unprivileged user
> > is good for is expanding the number of cgroup links in every css_set,
> > which is a DOS attack.
> >
> > We could allow hierarchies to be created in namespaces in the initial
> > user namespace. Unfortunately there is only a single namespace for
> > the names of heirarchies, so that is likely to create more confusion
> > than not.
> >
> > So do the simple thing and restrict hiearchy creation to the initial
> > cgroup namespace.
> >
> > Cc: stable at vger.kernel.org
> > Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
> > Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
> > Signed-off-by: Tejun Heo <tj at kernel.org>
> >
> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> > index e75efa8..e0be49f 100644
> > --- a/kernel/cgroup.c
> > +++ b/kernel/cgroup.c
> > @@ -2215,12 +2215,8 @@ static struct dentry *cgroup_mount(struct
> > file_system_type *fs_type,
> > goto out_unlock;
> > }
> >
> > - /*
> > - * We know this subsystem has not yet been bound. Users in a non-init
> > - * user namespace may only mount hierarchies with no bound subsystems,
> > - * i.e. 'none,name=user1'
> > - */
> > - if (!opts.none && !capable(CAP_SYS_ADMIN)) {
> > + /* Hierarchies may only be created in the initial cgroup namespace. */
> > + if (ns != &init_cgroup_ns) {
> > ret = -EPERM;
> > goto out_unlock;
> > }
More information about the lxc-users
mailing list