[lxc-devel] [PATCH] cgfs: don't mount /sys/fs/cgroup readonly

Stéphane Graber stgraber at ubuntu.com
Fri May 2 21:20:56 UTC 2014


On Fri, May 02, 2014 at 11:04:15PM +0200, Christian Seiler wrote:
> Hi Serge,
> 
> > /sys/fs/cgroup is just a size-limited tmpfs, and making it ro does
> > nothing to affect our ability alter mount settings of its subdirs.
> > OTOH making it ro can upset mountall in the container which tries
> > to remount it rw, which may be refused.
> > 
> > So just don't do it.
> > 
> > Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
> > Cc: Christian Seiler <christian at iwakd.de>
> 
> NAK!

Reverted.

> 
> > ---
> >  src/lxc/cgfs.c | 16 ----------------
> >  1 file changed, 16 deletions(-)
> > 
> > diff --git a/src/lxc/cgfs.c b/src/lxc/cgfs.c
> > index db2a973..ba7df89 100644
> > --- a/src/lxc/cgfs.c
> > +++ b/src/lxc/cgfs.c
> > @@ -1413,14 +1413,6 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
> >  				SYSERROR("error bind-mounting %s to %s", mp->mount_point, abs_path);
> >  				goto out_error;
> >  			}
> > -			/* main cgroup path should be read-only */
> > -			if (type == LXC_AUTO_CGROUP_FULL_RO || type == LXC_AUTO_CGROUP_FULL_MIXED) {
> > -				r = mount(NULL, abs_path, NULL, MS_REMOUNT|MS_BIND|MS_RDONLY, NULL);
> > -				if (r < 0) {
> > -					SYSERROR("error re-mounting %s readonly", abs_path);
> > -					goto out_error;
> > -				}
> > -			}
> 
> This is not the tmpfs! Also, this breaks logic if you want the whole
> cgroup tree to be read-only. Without this cgroup-full:ro and
> cgroup-full:mixed is identical to cgroup-full:rw!
> 
> The code here is for the mount points /sys/fs/cgroup/$subsystem.
> abs_path contains the path to the mount point inside the container of
> that. This can be triggered only by cgroup-full:{ro,mixed}. Let's
> examine the cases:
> 
> cgroup-full:ro: In that case, the entire cgroup hierarchy should be
> mounted read-only. No mount will be done for the container's own cgroup,
> so NOT remounted read-only here will leave the hierarchy read-write and
> therefore making the ro flag meaningless.
> 
> cgroup-full:mixed: In that case, the cgroup hierarchy should be mounted
> read-only with the exception of the own cgroup (see documentation). For
> this case, it is also needed that /sys/fs/cgroup/$subsystem inside the
> container be remounted read-only. In the subsequent code, the
> container's own cgroup will be bind-remounted onto itself read-write if
> mixed is specified, which is meaningless if the parent mount is already
> read-write.
> 
> (Note that the idea here is that this only works as a restriction for
> containers without CAP_SYS_ADMIN; otherwise, the container can change
> the mount options anyway.)
> 
> So if you remove this part where the local mount point is made readonly,
> then it's equivalent to removing the :ro and :mixed flags of cgroup-full.
> 
> If this doesn't work for Ubuntu (mountall is upstart, therefore Ubuntu,
> right?), then it's probably better just to update the manpage and say
> that Ubuntu only works with cgroup-full:rw. Also, if you think that the
> default for cgroup-full (which is currently cgroup-full:mixed) should
> rather be cgroup-full:rw, then we could also change that (see
> confile.c). But please don't remove this piece of code.
> 
> If you just want the tmpfs to be read-write, then the following does that:
> 
> > @@ -1487,14 +1479,6 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
> >  		parts = NULL;
> >  	}
> >  
> > -	/* try to remount the tmpfs readonly, since the container shouldn't
> > -	 * change anything (this will also make sure that trying to create
> > -	 * new cgroups outside the allowed area fails with an error instead
> > -	 * of simply causing this to create directories in the tmpfs itself)
> > -	 */
> > -	if (type != LXC_AUTO_CGROUP_RW && type != LXC_AUTO_CGROUP_FULL_RW)
> > -		mount(NULL, path, NULL, MS_REMOUNT|MS_RDONLY, NULL);
> > -
> >  	free(path);
> >  
> >  	return true;
> 
> Although here I also see a problem for a different use case:
> 
> Let's say you have a container which is mounted cgroup:mixed (not
> cgroup-full). Let's assume for the sake of argument that we only have
> the 'cpu' hierarchy. Then the following mountpoints will exist (for the
> case cgroup:mixed):
> 
> /sys/fs/cgroup                  [tmpfs, ro]
> /sys/fs/cgroup/cpu/lxc/c1       [bind-mount of host, rw]
> 
> If the container now has a script that says:
> echo $$ > /sys/fs/cgroup/cpu/tasks
> to put itself into the root cgroup, then in the current version of LXC,
> this will fail with a "read-only filesystem" error, i.e. there is an
> immediate feedback for the script that something went wrong and it
> couldn't put itself into the root cgroup.
> 
> If you now make the tmpfs (even if you just do the tmpfs part, not the
> other part of the code) read-write, then the above script will just
> create a file "tasks" in the tmpfs and put the script's current PID in
> it; the script will never know that this did not have the desired effect
> of moving itself into the root cgroup. So the script now thinks it did
> something which it actually didn't do. In order to avoid that, I
> explicitly remounted the tmpfs readonly for that case.
> 
> 
> 
> 
> Perhaps you could tell me what you did and what the problem was, before
> modifying this part of the code? Then we could figure out a solution for
> that together. I put quite a bit of thought into it to make sure the
> logic behind it is sane, and I don't see the reason for your changes.
> 
> Regards,
> Christian
> 
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20140502/b1bea72b/attachment.sig>


More information about the lxc-devel mailing list