[lxc-devel] [PATCH] cgfs: don't mount /sys/fs/cgroup readonly
Stéphane Graber
stgraber at ubuntu.com
Fri May 2 21:20:56 UTC 2014
On Fri, May 02, 2014 at 11:04:15PM +0200, Christian Seiler wrote:
> Hi Serge,
>
> > /sys/fs/cgroup is just a size-limited tmpfs, and making it ro does
> > nothing to affect our ability alter mount settings of its subdirs.
> > OTOH making it ro can upset mountall in the container which tries
> > to remount it rw, which may be refused.
> >
> > So just don't do it.
> >
> > Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
> > Cc: Christian Seiler <christian at iwakd.de>
>
> NAK!
Reverted.
>
> > ---
> > src/lxc/cgfs.c | 16 ----------------
> > 1 file changed, 16 deletions(-)
> >
> > diff --git a/src/lxc/cgfs.c b/src/lxc/cgfs.c
> > index db2a973..ba7df89 100644
> > --- a/src/lxc/cgfs.c
> > +++ b/src/lxc/cgfs.c
> > @@ -1413,14 +1413,6 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
> > SYSERROR("error bind-mounting %s to %s", mp->mount_point, abs_path);
> > goto out_error;
> > }
> > - /* main cgroup path should be read-only */
> > - if (type == LXC_AUTO_CGROUP_FULL_RO || type == LXC_AUTO_CGROUP_FULL_MIXED) {
> > - r = mount(NULL, abs_path, NULL, MS_REMOUNT|MS_BIND|MS_RDONLY, NULL);
> > - if (r < 0) {
> > - SYSERROR("error re-mounting %s readonly", abs_path);
> > - goto out_error;
> > - }
> > - }
>
> This is not the tmpfs! Also, this breaks logic if you want the whole
> cgroup tree to be read-only. Without this cgroup-full:ro and
> cgroup-full:mixed is identical to cgroup-full:rw!
>
> The code here is for the mount points /sys/fs/cgroup/$subsystem.
> abs_path contains the path to the mount point inside the container of
> that. This can be triggered only by cgroup-full:{ro,mixed}. Let's
> examine the cases:
>
> cgroup-full:ro: In that case, the entire cgroup hierarchy should be
> mounted read-only. No mount will be done for the container's own cgroup,
> so NOT remounted read-only here will leave the hierarchy read-write and
> therefore making the ro flag meaningless.
>
> cgroup-full:mixed: In that case, the cgroup hierarchy should be mounted
> read-only with the exception of the own cgroup (see documentation). For
> this case, it is also needed that /sys/fs/cgroup/$subsystem inside the
> container be remounted read-only. In the subsequent code, the
> container's own cgroup will be bind-remounted onto itself read-write if
> mixed is specified, which is meaningless if the parent mount is already
> read-write.
>
> (Note that the idea here is that this only works as a restriction for
> containers without CAP_SYS_ADMIN; otherwise, the container can change
> the mount options anyway.)
>
> So if you remove this part where the local mount point is made readonly,
> then it's equivalent to removing the :ro and :mixed flags of cgroup-full.
>
> If this doesn't work for Ubuntu (mountall is upstart, therefore Ubuntu,
> right?), then it's probably better just to update the manpage and say
> that Ubuntu only works with cgroup-full:rw. Also, if you think that the
> default for cgroup-full (which is currently cgroup-full:mixed) should
> rather be cgroup-full:rw, then we could also change that (see
> confile.c). But please don't remove this piece of code.
>
> If you just want the tmpfs to be read-write, then the following does that:
>
> > @@ -1487,14 +1479,6 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
> > parts = NULL;
> > }
> >
> > - /* try to remount the tmpfs readonly, since the container shouldn't
> > - * change anything (this will also make sure that trying to create
> > - * new cgroups outside the allowed area fails with an error instead
> > - * of simply causing this to create directories in the tmpfs itself)
> > - */
> > - if (type != LXC_AUTO_CGROUP_RW && type != LXC_AUTO_CGROUP_FULL_RW)
> > - mount(NULL, path, NULL, MS_REMOUNT|MS_RDONLY, NULL);
> > -
> > free(path);
> >
> > return true;
>
> Although here I also see a problem for a different use case:
>
> Let's say you have a container which is mounted cgroup:mixed (not
> cgroup-full). Let's assume for the sake of argument that we only have
> the 'cpu' hierarchy. Then the following mountpoints will exist (for the
> case cgroup:mixed):
>
> /sys/fs/cgroup [tmpfs, ro]
> /sys/fs/cgroup/cpu/lxc/c1 [bind-mount of host, rw]
>
> If the container now has a script that says:
> echo $$ > /sys/fs/cgroup/cpu/tasks
> to put itself into the root cgroup, then in the current version of LXC,
> this will fail with a "read-only filesystem" error, i.e. there is an
> immediate feedback for the script that something went wrong and it
> couldn't put itself into the root cgroup.
>
> If you now make the tmpfs (even if you just do the tmpfs part, not the
> other part of the code) read-write, then the above script will just
> create a file "tasks" in the tmpfs and put the script's current PID in
> it; the script will never know that this did not have the desired effect
> of moving itself into the root cgroup. So the script now thinks it did
> something which it actually didn't do. In order to avoid that, I
> explicitly remounted the tmpfs readonly for that case.
>
>
>
>
> Perhaps you could tell me what you did and what the problem was, before
> modifying this part of the code? Then we could figure out a solution for
> that together. I put quite a bit of thought into it to make sure the
> logic behind it is sane, and I don't see the reason for your changes.
>
> Regards,
> Christian
>
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel
--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20140502/b1bea72b/attachment.sig>
More information about the lxc-devel
mailing list