[lxc-devel] [PATCH] c/r: remember to chown the cgroup path (correctly)

Thu Jan 14 19:43:20 UTC 2016

Quoting Tycho Andersen (tycho.andersen at canonical.com):
> On Thu, Jan 14, 2016 at 09:28:07AM +0000, Serge Hallyn wrote:
> > Quoting Tycho Andersen (tycho.andersen at canonical.com):
> > > On Wed, Jan 13, 2016 at 09:47:50PM +0000, Serge Hallyn wrote:
> > > > Quoting Tycho Andersen (tycho.andersen at canonical.com):
> > > > > 1. remember to chown the cgroup path when migrating a container
> > > > > 2. when restoring the cgroup path, try to compute the euid for root vs.
> > > > >    using geteuid(); geteuid works for start, but it doesn't work for
> > > > >    migration since we're still real root at that point.
> > > > > 
> > > > > Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> > > > > ---
> > > > >  src/lxc/cgmanager.c | 6 +++++-
> > > > >  src/lxc/criu.c      | 5 +++++
> > > > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/src/lxc/cgmanager.c b/src/lxc/cgmanager.c
> > > > > index 357182a..54e6912 100644
> > > > > --- a/src/lxc/cgmanager.c
> > > > > +++ b/src/lxc/cgmanager.c
> > > > > @@ -488,7 +488,11 @@ static bool chown_cgroup(const char *cgroup_path, struct lxc_conf *conf)
> > > > >  		return true;
> > > > >  
> > > > >  	data.cgroup_path = cgroup_path;
> > > > > -	data.origuid = geteuid();
> > > > > +	data.origuid = mapped_hostid(0, conf, ID_TYPE_UID);
> > 
> > now, when starting a container, this happens in the
> > parent task in original uid.  So the geteuid() returns 1000,
> > mapped_hostid(0, conf, ID_TYPE_UID) something like 100000.
> 
> Are you sure? Wouldn't it chmod everything to 1000 instead of 100000
> in that case and be all screwed up?

No I'm not sure :)  I was lazily asking you to add an fprintf and verify :)

> > This is probably ok - but did you run all the lxc tests against
> > this to make sure?  You can still run 'lxc-cgroup' etc as an
> > unpriv user?
> 
> Well, it does change the behavior in ways I don't quite understand
> yet, so that's probably worth talking about. Before this series (the
> 644 one and this patch), I get:
> 
> /sys/fs/cgroup/cpuset/lxc/unpriv $ ls -alh
> total 0
> drwxrwxr-x 2 root 100000 0 Jan 14 06:25 ./
> drwxr-xr-x 3 root root   0 Jan 14 06:25 ../
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cgroup.clone_children
> -rwxrwxr-x 1 root 100000 0 Jan 14 06:26 cgroup.procs*
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.cpu_exclusive
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.cpus
> -r--r--r-- 1 root root   0 Jan 14 06:25 cpuset.effective_cpus
> -r--r--r-- 1 root root   0 Jan 14 06:25 cpuset.effective_mems
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.mem_exclusive
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.mem_hardwall
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.memory_migrate
> -r--r--r-- 1 root root   0 Jan 14 06:25 cpuset.memory_pressure
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.memory_spread_page
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.memory_spread_slab
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.mems
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.sched_load_balance
> -rw-r--r-- 1 root root   0 Jan 14 06:25 cpuset.sched_relax_domain_level
> -rw-r--r-- 1 root root   0 Jan 14 06:25 notify_on_release
> -rwxrwxr-x 1 root 100000 0 Jan 14 06:25 tasks*
> 
> and after:
> 
> /sys/fs/cgroup/cpuset/lxc/unpriv $ ls -alh
> total 0
> drwxrwxr-x 2 100000 100000 0 Jan 14 06:27 ./
> drwxr-xr-x 3 root   root   0 Jan 14 06:27 ../
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cgroup.clone_children
> -rw-rw-r-- 1 100000 100000 0 Jan 14 06:27 cgroup.procs
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.cpu_exclusive
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.cpus
> -r--r--r-- 1 root   root   0 Jan 14 06:27 cpuset.effective_cpus
> -r--r--r-- 1 root   root   0 Jan 14 06:27 cpuset.effective_mems
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.mem_exclusive
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.mem_hardwall
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.memory_migrate
> -r--r--r-- 1 root   root   0 Jan 14 06:27 cpuset.memory_pressure
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.memory_spread_page
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.memory_spread_slab
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.mems
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.sched_load_balance
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 cpuset.sched_relax_domain_level
> -rw-r--r-- 1 root   root   0 Jan 14 06:27 notify_on_release
> -rw-rw-r-- 1 100000 100000 0 Jan 14 06:27 tasks
> 
> However, it's not clear really /why/ from this patch the uid on tasks and
> cgroup.progs gets changed as well (probably some mask bug? it was ID_TYPE_UID before but changing the gid? not sure).

Cgmanager does that automatically.  When you chown cgroup freezer:/lxc/c1,
cgmanager chowns the directory itself (so you can create new cgroups), and
the procs and tasks files, so you can move tasks in.  It does not change
the other files so you can't escape your limits.

And right, that is why the parent dir must be -1:100000, so you can't chown
the files.  This gets sabotaged when, as uid 1000 on the host, you map hostuid
1000 into the container.  But parent cgroups (like systemd user scopes) will at
least still confine you.