[lxc-users] Containers on linux-4.8-rc1 sometimes(?) requiring "cgmanager -m name=systemd" (bisected, but is it a bug?)
Eric W. Biederman
ebiederm at xmission.com
Tue Sep 13 16:17:09 UTC 2016
"Serge E. Hallyn" <serge at hallyn.com> writes:
> Quoting Eric W. Biederman (ebiederm at xmission.com):
>> Adam Richter <adamrichter4 at gmail.com> writes:
>>
>> > On Linux 4.8-rc1 through 4-8-rc6 (latest rc), lxc fails start to
>> > Ubuntu 16.04 and Centos 7 containers [1], unless I first run
>> > "cgmanager -m name=systemd &" on the host, which, unlike the
>> > containers, was not running systemd or cgmanager.
>>
>> Yes, that appears correct. Given the current flat namespace of
>> hierarchies you fundamentally must coordinate with the host if you want
>> to use a new hierarchy. So running cgmanager on the host seems like
>> the minimum way to do that.
>>
>> If we truly need something more (which does not appear to be the case
>> here) the names of hierarchies need to be moved into a namespace.
>>
>> > Git bisect revealed that this behavior began with a commit entitled
>> > "cgroupns: Only allow creation of hierarchies in the initial cgroup
>> > namespace" [2], which appears to be an attempt to protect against a
>> > possible denial of service attack. Reversing the commit also restores
>> > successful commit the need to run that cgmanager process. [Eric and
>> > Tejun, I have bcc'ed you so you can be aware of this discussion
>> > thread, as you apparently respectively wrote and approved the commit.]
>>
>> As far as I can tell you were getting lucky and not having problems
>> before.
>>
>> > Running that cgmanager invocation is pretty simple, and seems to me to
>> > be well worth closing a denial of service vulnerability, much as I
>> > dislike adding something systemd-specific to a non-systemd environment
>> > and adding a new dependency (lxc requires cgmanager on the host to
>> > run, I guess, any container that runs systemd). However, I am posting
>> > this message because I don't fully understand the problem, and, most
>> > importantly, I am wondering if I have stumbled on an unintended
>> > consequence of this commit that might have other indicate other
>> > potential breakage.
>>
>> I am surprised that your case worked but I don't think it amounts to an
>> unintended consequence.
>>
>> > If this new lxc behavior is completely acceptable, then I apologize
>> > for consuming people's time with it and hope that this message will
>> > allow others experiencing the same problem find an answer for it when
>> > they search the web.
>>
>> I will let the lxc-developers judge.
>>
>> I don't think you hit a case that was expected to work. Furthermore
>
> fwiw indeed this was never expected to work.
>
As just creating the hiearchy before starting the container fixes this,
I agree this does appear to be just a documentation issue.
>> either your containers were overprivileged or they would not have been
>> able to create subdirectories in the cgroup hierarchy. So I expect this
>> change transformed a subtle breakage (aka one you had not noticed yet)
>> into an explicit breakage.
>>
>> I am not subscribed to lxc-users so I don't know if anyone else has
>> replied to your post. Cc's would have been better than Bcc's for
>> getting feedback in a situation like this.
>>
>> Eric
>>
>>
>> > Adam Richter
>> >
>> >
>> > [1] Here is an example of failing to start one of these containers.
>> > $ sudo lxc-start --name ubuntu16.04_amd64 --foreground
>> > Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
>> > [!!!!!!] Failed to mount API filesystems, freezing.
>> > Freezing execution.
>> >
>> >
>> > [2] Here is the commit diff that triggers the new mishbehavior.
>> > commit 726a4994b05ff5b6f83d64b5b43c3251217366ce
>> > Author: Eric W. Biederman <ebiederm at xmission.com>
>> > Date: Fri Jul 15 06:36:44 2016 -0500
>> >
>> > cgroupns: Only allow creation of hierarchies in the initial cgroup namespace
>> >
>> > Unprivileged users can't use hierarchies if they create them as they do not
>> > have privilieges to the root directory.
>> >
>> > Which means the only thing a hiearchy created by an unprivileged user
>> > is good for is expanding the number of cgroup links in every css_set,
>> > which is a DOS attack.
>> >
>> > We could allow hierarchies to be created in namespaces in the initial
>> > user namespace. Unfortunately there is only a single namespace for
>> > the names of heirarchies, so that is likely to create more confusion
>> > than not.
>> >
>> > So do the simple thing and restrict hiearchy creation to the initial
>> > cgroup namespace.
>> >
>> > Cc: stable at vger.kernel.org
>> > Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
>> > Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
>> > Signed-off-by: Tejun Heo <tj at kernel.org>
>> >
>> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
>> > index e75efa8..e0be49f 100644
>> > --- a/kernel/cgroup.c
>> > +++ b/kernel/cgroup.c
>> > @@ -2215,12 +2215,8 @@ static struct dentry *cgroup_mount(struct
>> > file_system_type *fs_type,
>> > goto out_unlock;
>> > }
>> >
>> > - /*
>> > - * We know this subsystem has not yet been bound. Users in a non-init
>> > - * user namespace may only mount hierarchies with no bound subsystems,
>> > - * i.e. 'none,name=user1'
>> > - */
>> > - if (!opts.none && !capable(CAP_SYS_ADMIN)) {
>> > + /* Hierarchies may only be created in the initial cgroup namespace. */
>> > + if (ns != &init_cgroup_ns) {
>> > ret = -EPERM;
>> > goto out_unlock;
>> > }
More information about the lxc-users
mailing list