[lxc-devel] [PATCH] Fix unprivileged containers started by root

Dwight Engen dwight.engen at oracle.com
Wed Feb 26 22:47:10 UTC 2014


On Wed, 26 Feb 2014 16:34:17 -0500
Stéphane Graber <stgraber at ubuntu.com> wrote:

> On Wed, Feb 26, 2014 at 03:55:56PM -0500, Dwight Engen wrote:
> > On Wed, 26 Feb 2014 14:55:57 -0500
> > Stéphane Graber <stgraber at ubuntu.com> wrote:
> > 
> > > Hi Dwight,
> > > 
> > > Serge suggested that you may have been relying on the previous
> > > behavior, will that change be a problem for you?
> > > 
> > > Do you think this is reasonable?
> > > 
> > > Stéphane
> > 
> > Hey Stéphane, yeah I was running unprivileged containers and using
> > the lxc writing the mapping functionality. Up till this change we
> > could do manual management of uid ranges and not *have* to have
> > new[ug]idmap. At least I hadn't hit the other two cases where it
> > was needed (running a template in a mapped userns and
> > chown_mapped_root()). I do get that it would be nice to just rely
> > on the presence of new[ug]id map, especially to avoid range overlap.
> > 
> > RHEL7/OL7/Fedora rawhide still only ship shadow 4.1.x so that what
> > is a bit of a problem right now. Not that its hard to build a 4.2
> > shadow rpm, I've got one locally and I think Mike was building one
> > too. Do we know where other distros are as far as updating to the
> > newest shadow? I guess it just comes down to if we want to say that
> > your distro must have the new shadow if you want to use userns
> > containers.
> 
> Ubuntu will have it starting with 14.04. I also have some backports of
> it working as far down as 12.04.
> 
> Debian has been updating their packaging branch and I hope for a
> shadow upload there in the near future.
> 
> 
> I guess the question is whether we expect distros that ship a > 3.12
> to still ship shadow 4.1. To me it seems like forcing the distro's
> hands a bit and requiring the new shadow will make things much easier
> for us as well as prevent some mistakes on our users' part.

I agree, and those mistakes could have security consequences so we
should try hard to prevent them from happening. We should consider that
we are improving security by preventing configuration mistakes, but are
taking away a "more security" option for folks who have kernel > 3.12
but shadow < 4.2. Maybe that will be a very small percentage of users,
making it a good trade off. Whatever you decide, I'm okay with it.

Do you think we should have a configure check for shadow > 4.1 and if
its not present don't compile in the userns bits? I guess it can just
fail to start at run-time as long as the error message when system()
fails is clear about whats missing?

> Otherwise I'd totally see a scenario where someone on a system with
> shadow 4.2 will create a bunch of users not realizing that the newly
> granted ranges happen to overlap with existing containers that were
> created by root. That user is then able to strace/kill/read the
> filesystem of any existing such container...

If the admin starts the container ids fairly high like our docs suggest
at 100000, that "bunch of users" is quite a bit, but nevertheless I see
your point :)
 
> > 
> > > On Wed, Feb 26, 2014 at 02:53:04PM -0500, Stéphane Graber wrote:
> > > > This change makes it possible to create unprivileged containers
> > > > as root. They will be stored in the usual system wide location,
> > > > use the usual system wide cache but will be running using a
> > > > uid/gid map.
> > > > 
> > > > To make things consistent as well as avoid potential range
> > > > conflicts, I'm removing the code that was directly writing the
> > > > uid/gid ranges and instead force all unprivileged containers
> > > > through newuidmap/newgidmap. This means you need to grant
> > > > uid/gid ranges to root just as you would for a normal user.
> > > > 
> > > > bdev and network restrictions however don't apply to those
> > > > containers, so you don't need to add lxc-usernic entries for
> > > > root.
> > > > 
> > > > This essentially requires anyone wanting to run unprivileged
> > > > containers to have newuidmap and newgidmap on their system,
> > > > those two are provided by a very recent version of shadow.
> > > > 
> > > > Signed-off-by: Stéphane Graber <stgraber at ubuntu.com>
> > > > ---
> > > >  src/lxc/conf.c         | 56
> > > > ++++++++++----------------------------------------
> > > > src/lxc/lxccontainer.c |  6 +++--- 2 files changed, 14
> > > > insertions(+), 48 deletions(-)
> > > > 
> > > > diff --git a/src/lxc/conf.c b/src/lxc/conf.c
> > > > index fc39897..2a9b672 100644
> > > > --- a/src/lxc/conf.c
> > > > +++ b/src/lxc/conf.c
> > > > @@ -3132,32 +3132,6 @@ int lxc_assign_network(struct lxc_list
> > > > *network, pid_t pid) return 0;
> > > >  }
> > > >  
> > > > -static int write_id_mapping(enum idtype idtype, pid_t pid,
> > > > const char *buf,
> > > > -			    size_t buf_size)
> > > > -{
> > > > -	char path[PATH_MAX];
> > > > -	int ret, closeret;
> > > > -	FILE *f;
> > > > -
> > > > -	ret = snprintf(path, PATH_MAX, "/proc/%d/%cid_map",
> > > > pid, idtype == ID_TYPE_UID ? 'u' : 'g');
> > > > -	if (ret < 0 || ret >= PATH_MAX) {
> > > > -		fprintf(stderr, "%s: path name too long\n",
> > > > __func__);
> > > > -		return -E2BIG;
> > > > -	}
> > > > -	f = fopen(path, "w");
> > > > -	if (!f) {
> > > > -		perror("open");
> > > > -		return -EINVAL;
> > > > -	}
> > > > -	ret = fwrite(buf, buf_size, 1, f);
> > > > -	if (ret < 0)
> > > > -		SYSERROR("writing id mapping");
> > > > -	closeret = fclose(f);
> > > > -	if (closeret)
> > > > -		SYSERROR("writing id mapping");
> > > > -	return ret < 0 ? ret : closeret;
> > > > -}
> > > > -
> > > >  int lxc_map_ids(struct lxc_list *idmap, pid_t pid)
> > > >  {
> > > >  	struct lxc_list *iterator;
> > > > @@ -3165,7 +3139,6 @@ int lxc_map_ids(struct lxc_list *idmap,
> > > > pid_t pid) int ret = 0;
> > > >  	enum idtype type;
> > > >  	char *buf = NULL, *pos;
> > > > -	int am_root = (getuid() == 0);
> > > >  
> > > >  	for(type = ID_TYPE_UID; type <= ID_TYPE_GID; type++) {
> > > >  		int left, fill;
> > > > @@ -3176,10 +3149,9 @@ int lxc_map_ids(struct lxc_list *idmap,
> > > > pid_t pid) return -ENOMEM;
> > > >  		}
> > > >  		pos = buf;
> > > > -		if (!am_root)
> > > > -			pos += sprintf(buf, "new%cidmap %d",
> > > > -				type == ID_TYPE_UID ? 'u' :
> > > > 'g',
> > > > -				pid);
> > > > +		pos += sprintf(buf, "new%cidmap %d",
> > > > +			type == ID_TYPE_UID ? 'u' : 'g',
> > > > +			pid);
> > > >  
> > > >  		lxc_list_for_each(iterator, idmap) {
> > > >  			/* The kernel only takes <= 4k for
> > > > writes to /proc/<nr>/[ug]id_map */ @@ -3189,10 +3161,8 @@ int
> > > > lxc_map_ids(struct lxc_list *idmap, pid_t pid) 
> > > >  			had_entry = 1;
> > > >  			left = 4096 - (pos - buf);
> > > > -			fill = snprintf(pos, left, "%s%lu %lu
> > > > %lu%s",
> > > > -					am_root ? "" : " ",
> > > > -					map->nsid, map->hostid,
> > > > map->range,
> > > > -					am_root ? "\n" : "");
> > > > +			fill = snprintf(pos, left, " %lu %lu
> > > > %lu ",
> > > > +					map->nsid, map->hostid,
> > > > map->range); if (fill <= 0 || fill >= left)
> > > >  				SYSERROR("snprintf failed, too
> > > > many mappings"); pos += fill;
> > > > @@ -3200,16 +3170,12 @@ int lxc_map_ids(struct lxc_list *idmap,
> > > > pid_t pid) if (!had_entry)
> > > >  			continue;
> > > >  
> > > > -		if (am_root) {
> > > > -			ret = write_id_mapping(type, pid, buf,
> > > > pos-buf);
> > > > -		} else {
> > > > -			left = 4096 - (pos - buf);
> > > > -			fill = snprintf(pos, left, "\n");
> > > > -			if (fill <= 0 || fill >= left)
> > > > -				SYSERROR("snprintf failed, too
> > > > many mappings");
> > > > -			pos += fill;
> > > > -			ret = system(buf);
> > > > -		}
> > > > +		left = 4096 - (pos - buf);
> > > > +		fill = snprintf(pos, left, "\n");
> > > > +		if (fill <= 0 || fill >= left)
> > > > +			SYSERROR("snprintf failed, too many
> > > > mappings");
> > > > +		pos += fill;
> > > > +		ret = system(buf);
> > > >  
> > > >  		if (ret)
> > > >  			break;
> > > > diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c
> > > > index 0427791..d9aa973 100644
> > > > --- a/src/lxc/lxccontainer.c
> > > > +++ b/src/lxc/lxccontainer.c
> > > > @@ -806,7 +806,7 @@ static struct bdev *do_bdev_create(struct
> > > > lxc_container *c, const char *type, /* if we are not root, chown
> > > > the rootfs dir to root in the
> > > >  	 * target uidmap */
> > > >  
> > > > -	if (geteuid() != 0) {
> > > > +	if (geteuid() != 0 || (c->lxc_conf
> > > > && !lxc_list_empty(&c->lxc_conf->id_map))) { if
> > > > (chown_mapped_root(bdev->dest, c->lxc_conf) < 0) { ERROR("Error
> > > > chowning %s to container root", bdev->dest); bdev_put(bdev);
> > > > @@ -992,7 +992,7 @@ static bool create_run_template(struct
> > > > lxc_container *c, char *tpath, bool quiet
> > > >  		 * and we append "--mapped-uid x", where x is
> > > > the mapped uid
> > > >  		 * for our geteuid()
> > > >  		 */
> > > > -		if (geteuid() != 0
> > > > && !lxc_list_empty(&conf->id_map)) {
> > > > +		if (!lxc_list_empty(&conf->id_map)) {
> > > >  			int n2args = 1;
> > > >  			char txtuid[20];
> > > >  			char txtgid[20];
> > > > @@ -1450,7 +1450,7 @@ static inline bool enter_to_ns(struct
> > > > lxc_container *c) { init_pid = c->init_pid(c);
> > > >  
> > > >  	/* Switch to new userns */
> > > > -	if (geteuid() && access("/proc/self/ns/user", F_OK) ==
> > > > 0) {
> > > > +	if ((geteuid() != 0 || (c->lxc_conf
> > > > && !lxc_list_empty(&c->lxc_conf->id_map))) &&
> > > > access("/proc/self/ns/user", F_OK) == 0) { ret =
> > > > snprintf(new_userns_path, MAXPATHLEN, "/proc/%d/ns/user",
> > > > init_pid); if (ret < 0 || ret >= MAXPATHLEN) goto out; -- 
> > > > 1.9.0
> > > > 
> > > 
> > 
> 



More information about the lxc-devel mailing list