[lxc-devel] [PATCH 1/1] Implement userid mappings (enable user namespaces)
Stéphane Graber
stgraber at ubuntu.com
Tue Jan 15 17:12:01 UTC 2013
On 01/14/2013 07:03 PM, Serge Hallyn wrote:
> The 3.8 kernel now supporst uid mappings, so I believe it's appropriate
> to proceed with this patchset.
> The container config supports new entries of the form:
> lxc.id_map = U 100000 0 10000
> lxc.id_map = G 100000 0 10000
> meaning map 'virtual' uids (in the container) 0-10000 to uids
> 100000-110000 on the host, and same for gids. So long as there are
> mappings specified in the container config, then CONFIG_NEWUSER will
> be used when the container is cloned. This means that container
> setup is no longer done with root privilege on the host, only root
> privilege in the container. Therefore cgroup setup is moved from the
> init task to the monitor task.
>
> To use this patchset, you currently need to either use the raring
> kernel at ppa:serge-hallyn/usern-natty, or build your own kernel
> from either git://kernel.ubuntu.com/serge/quantal-userns.git.
> (Alternatively you can use Eric's tree at the latest userns-always-map-*
> branch at
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
> but you will likely want to at least enable tmpfs mounts in user namespaces)
>
> You also need to chown the files in the container rootfs into the
> mapped range. There is a utility at
> https://code.launchpad.net/~serge-hallyn/+junk/nsexec to do this.
> uidmapshift does the chowning, while the container-userns-convert
> script nicely wraps that program. So I simply
>
> sudo lxc-create -t ubuntu -n r1
> sudo container-userns-convert r1 200000
>
> will create a container which is shifted so uid 0 in the container
> is uid 200000 on the host.
I guess we'll want that tool merged upstream to make things easier for
those who want to play with the user namespace.
> TODO: when doing setuid(0), need to only do that if 0 is one of the
> ids we map to. Similarly, when dropping capabilities, need to only
> not do that if 0 is one of the ids we map to. However, the question
> of what to do for 'weird' containers in private user namespaces is
> one I'm punting for later.
>
> Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
Acked-by: Stéphane Graber <stgraber at ubuntu.com>
Some fixes done, listed inline below.
> ---
> doc/lxc.conf.sgml.in | 40 +++++++++++++++
> src/lxc/conf.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++++--
> src/lxc/conf.h | 26 ++++++++++
> src/lxc/confile.c | 60 +++++++++++++++++++++++
> src/lxc/start.c | 35 ++++++++++++++
> 5 files changed, 292 insertions(+), 3 deletions(-)
>
> diff --git a/doc/lxc.conf.sgml.in b/doc/lxc.conf.sgml.in
> index 1298143..ae91221 100644
> --- a/doc/lxc.conf.sgml.in
> +++ b/doc/lxc.conf.sgml.in
> @@ -690,6 +690,46 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> </refsect2>
>
> <refsect2>
> + <title>UID mappings</title>
> + <para>
> + A container can be started in a private user namespace with
> + user and group id mappings. For instance, you can map userid
> + 0 in the container to userid 200000 on the host. The root
> + user in the container will be privileged in the container,
> + but unprivileged on the host. Normally a system container
> + will want a range of ids, so you would map, for instance,
> + user and group ids 0 through 20,000 in the container to the
> + ids 200,000 through 220,000.
> + </para>
> + <variablelist>
> + <varlistentry>
> + <term>
> + <option>lxc.id_map</option>
> + </term>
> + <listitem>
> + <para>
> + Four values must be provided. First a character, either
> + 'U', or 'G', to specify whether user or group ids are
> + being mapped. Next is the first userid as seen on the
> + host. Next is the userid to be mapped in the container.
> + Finally, a range indicating the number of consecutive
> + ids to map. For instance
> + </para>
> +<programlisting>
> + lxc.id_map = U 200000 0 20000
> + lxc.id_map = G 200000 0 20000
> +</programlisting>
> + <para>
> + will map both user and group ids in the
> + range 0-19999 in the container to the ids
> + 200000-219999 on the host.
> + </para>
> + </listitem>
> + </varlistentry>
> + </variablelist>
> + </refsect2>
> +
> + <refsect2>
> <title>Startup hooks</title>
> <para>
> Startup hooks are programs or scripts which can be executed
> diff --git a/src/lxc/conf.c b/src/lxc/conf.c
> index b516d7d..10d713b 100644
> --- a/src/lxc/conf.c
> +++ b/src/lxc/conf.c
> @@ -2053,6 +2053,7 @@ struct lxc_conf *lxc_conf_init(void)
> lxc_list_init(&new->network);
> lxc_list_init(&new->mount_list);
> lxc_list_init(&new->caps);
> + lxc_list_init(&new->id_map);
> for (i=0; i<NUM_LXC_HOOKS; i++)
> lxc_list_init(&new->hooks[i]);
> #if HAVE_APPARMOR
> @@ -2427,6 +2428,44 @@ int lxc_assign_network(struct lxc_list *network, pid_t pid)
> return 0;
> }
>
> +int add_id_mapping(enum idtype idtype, pid_t pid, uid_t host_start, uid_t ns_start, int range)
> +{
> + char path[PATH_MAX];
> + int ret;
> + FILE *f;
> +
> + ret = snprintf(path, PATH_MAX, "/proc/%d/%cid_map", pid, idtype == ID_TYPE_UID ? 'u' : 'g');
> + if (ret < 0 || ret >= PATH_MAX) {
> + fprintf(stderr, "%s: path name too long", __func__);
> + return -E2BIG;
> + }
> + f = fopen(path, "w");
> + if (!f) {
> + perror("open");
> + return -EINVAL;
> + }
> + ret = fprintf(f, "%d %d %d", ns_start, host_start, range);
> + if (ret < 0)
> + perror("write");
> + fclose(f);
> + return ret < 0 ? ret : 0;
> +}
That was space indented with a tabstop of 8, changed to tab-indent
instead for consistency.
> +int lxc_map_ids(struct lxc_list *idmap, pid_t pid)
> +{
> + struct lxc_list *iterator;
> + struct id_map *map;
> + int ret = 0;
> +
> + lxc_list_for_each(iterator, idmap) {
> + map = iterator->elem;
> + ret = add_id_mapping(map->idtype, pid, map->hostid, map->nsid, map->range);
> + if (ret)
> + break;
> + }
> + return ret;
> +}
> +
> int lxc_find_gateway_addresses(struct lxc_handler *handler)
> {
> struct lxc_list *network = &handler->conf->network;
> @@ -2535,6 +2574,93 @@ void lxc_delete_tty(struct lxc_tty_info *tty_info)
> tty_info->nbtty = 0;
> }
>
> +/*
> + * given a host uid, return the ns uid if it is mapped.
> + * if it is not mapped, return the original host id.
> + */
> +static int shiftid(struct lxc_conf *c, int uid, enum idtype w)
> +{
> + struct lxc_list *iterator;
> + struct id_map *map;
> + int low, high;
> +
> + lxc_list_for_each(iterator, &c->id_map) {
> + map = iterator->elem;
> + if (map->idtype != w)
> + continue;
> +
> + low = map->nsid;
> + high = map->nsid + map->range;
> + if (uid < low || uid >= high)
> + continue;
> +
> + return uid - low + map->hostid;
> + }
> +
> + return uid;
> +}
> +
> +/*
> + * Take a pathname for a file created on the host, and map the uid and gid
> + * into the container if needed. (Used for ttys)
> + */
> +static int uid_shift_file(char *path, struct lxc_conf *c)
> +{
> + struct stat statbuf;
> + int newuid, newgid;
> +
> + if (stat(path, &statbuf)) {
> + SYSERROR("stat(%s)", path);
> + return -1;
> + }
> +
> + newuid = shiftid(c, statbuf.st_uid, ID_TYPE_UID);
> + newgid = shiftid(c, statbuf.st_gid, ID_TYPE_GID);
> + if (newuid != statbuf.st_uid || newgid != statbuf.st_gid) {
> + DEBUG("chowning %s from %d:%d to %d:%d\n", path, statbuf.st_uid, statbuf.st_gid, newuid, newgid);
> + if (chown(path, newuid, newgid)) {
> + SYSERROR("chown(%s)", path);
> + return -1;
> + }
> + }
> + return 0;
> +}
> +
> +int uid_shift_ttys(int pid, struct lxc_conf *conf)
> +{
> + int i, ret;
> + struct lxc_tty_info *tty_info = &conf->tty_info;
> + char path[MAXPATHLEN];
> + char *ttydir = conf->ttydir;
> +
> + if (!conf->rootfs.path)
> + return 0;
> + /* first the console */
> + ret = snprintf(path, sizeof(path), "/proc/%d/root/dev/%s/console", pid, ttydir ? ttydir : "");
> + if (ret < 0 || ret >= sizeof(path)) {
> + ERROR("console path too long\n");
> + return -1;
> + }
> + if (uid_shift_file(path, conf)) {
> + DEBUG("Failed to chown the console %s.\n", path);
> + return -1;
> + }
> + for (i=0; i< tty_info->nbtty; i++) {
> + ret = snprintf(path, sizeof(path), "/proc/%d/root/dev/%s/tty%d",
> + pid, ttydir ? ttydir : "", i + 1);
> + if (ret < 0 || ret >= sizeof(path)) {
> + ERROR("pathname too long for ttys");
> + return -1;
> + }
> + if (uid_shift_file(path, conf)) {
> + DEBUG("Failed to chown pty %s.\n", path);
> + return -1;
> + }
> + }
> +
> + return 0;
> +}
> +
> int lxc_setup(const char *name, struct lxc_conf *lxc_conf)
> {
> #if HAVE_APPARMOR /* || HAVE_SMACK || HAVE_SELINUX */
> @@ -2637,9 +2763,11 @@ int lxc_setup(const char *name, struct lxc_conf *lxc_conf)
> return -1;
> }
>
> - if (setup_caps(&lxc_conf->caps)) {
> - ERROR("failed to drop capabilities");
> - return -1;
> + if (lxc_list_empty(&lxc_conf->id_map)) {
> + if (setup_caps(&lxc_conf->caps)) {
> + ERROR("failed to drop capabilities");
> + return -1;
> + }
> }
>
> NOTICE("'%s' is setup.", name);
> diff --git a/src/lxc/conf.h b/src/lxc/conf.h
> index e226859..4c48b46 100644
> --- a/src/lxc/conf.h
> +++ b/src/lxc/conf.h
> @@ -142,6 +142,26 @@ struct lxc_cgroup {
> char *value;
> };
>
> +enum idtype {
> + ID_TYPE_UID,
> + ID_TYPE_GID
> +};
> +
> +/*
> + * id_map is an id map entry. Form in confile is:
> + * lxc.id_map = U 9800 0 100
> + * lxc.id_map = U 9900 1000 100
> + * lxc.id_map = G 9800 0 100
> + * lxc.id_map = G 9900 1000 100
> + * meaning the container can use uids and gids 0-100 and 1000-1100,
> + * with uid 0 mapping to uid 9800 on the host, and gid 1000 to
> + * gid 9900 on the host.
> + */
> +struct id_map {
> + enum idtype idtype;
> + int hostid, nsid, range;
> +};
> +
> /*
> * Defines a structure containing a pty information for
> * virtualizing a tty
> @@ -232,6 +252,7 @@ struct lxc_conf {
> int personality;
> struct utsname *utsname;
> struct lxc_list cgroup;
> + struct lxc_list id_map;
> struct lxc_list network;
> struct saved_nic *saved_nics;
> int num_savednics;
> @@ -275,6 +296,7 @@ extern int pin_rootfs(const char *rootfs);
> extern int lxc_create_network(struct lxc_handler *handler);
> extern void lxc_delete_network(struct lxc_handler *handler);
> extern int lxc_assign_network(struct lxc_list *networks, pid_t pid);
> +extern int lxc_map_ids(struct lxc_list *idmap, pid_t pid);
> extern int lxc_find_gateway_addresses(struct lxc_handler *handler);
>
> extern int lxc_create_tty(const char *name, struct lxc_conf *conf);
> @@ -287,6 +309,10 @@ extern int lxc_clear_cgroups(struct lxc_conf *c, const char *key);
> extern int lxc_clear_mount_entries(struct lxc_conf *c);
> extern int lxc_clear_hooks(struct lxc_conf *c, const char *key);
>
> +extern int setup_cgroup(const char *name, struct lxc_list *cgroups);
> +
> +extern int uid_shift_ttys(int pid, struct lxc_conf *conf);
> +
> /*
> * Configure the container from inside
> */
> diff --git a/src/lxc/confile.c b/src/lxc/confile.c
> index 034136e..850894e 100644
> --- a/src/lxc/confile.c
> +++ b/src/lxc/confile.c
> @@ -58,6 +58,7 @@ static int config_ttydir(const char *, const char *, struct lxc_conf *);
> static int config_aa_profile(const char *, const char *, struct lxc_conf *);
> #endif
> static int config_cgroup(const char *, const char *, struct lxc_conf *);
> +static int config_idmap(const char *, const char *, struct lxc_conf *);
> static int config_loglevel(const char *, const char *, struct lxc_conf *);
> static int config_logfile(const char *, const char *, struct lxc_conf *);
> static int config_mount(const char *, const char *, struct lxc_conf *);
> @@ -97,6 +98,7 @@ static struct lxc_config_t config[] = {
> { "lxc.aa_profile", config_aa_profile },
> #endif
> { "lxc.cgroup", config_cgroup },
> + { "lxc.id_map", config_idmap },
> { "lxc.loglevel", config_loglevel },
> { "lxc.logfile", config_logfile },
> { "lxc.mount", config_mount },
> @@ -1021,6 +1023,64 @@ out:
> return -1;
> }
>
> +static int config_idmap(const char *key, const char *value, struct lxc_conf *lxc_conf)
> +{
> + char *token = "lxc.id_map";
> + char *subkey;
> + struct lxc_list *idmaplist = NULL;
> + struct id_map *idmap = NULL;
> + int hostid, nsid, range;
> + char type;
> + int ret;
> +
> + subkey = strstr(key, token);
> +
> + if (!subkey)
> + return -1;
> +
> + if (!strlen(subkey))
> + return -1;
> +
> + idmaplist = malloc(sizeof(*idmaplist));
> + if (!idmaplist)
> + goto out;
> +
> + idmap = malloc(sizeof(*idmap));
> + if (!idmap)
> + goto out;
> + memset(idmap, 0, sizeof(*idmap));
> +
> + idmaplist->elem = idmap;
> +
> + lxc_list_add_tail(&lxc_conf->id_map, idmaplist);
> +
> + ret = sscanf(value, "%c %d %d %d", &type, &hostid, &nsid, &range);
> + if (ret != 4)
> + goto out;
> + INFO("read uid map: type %c hostid %d nsid %d range %d", type, hostid, nsid, range);
> + if (type == 'U')
> + idmap->idtype = ID_TYPE_UID;
> + else if (type == 'G')
> + idmap->idtype = ID_TYPE_GID;
> + else
Trailing whitespace ^
> + goto out;
> + idmap->hostid = hostid;
> + idmap->nsid = nsid;
> + idmap->range = range;
> +
> + return 0;
> +
> +out:
> + if (idmaplist)
> + free(idmaplist);
> +
> + if (idmap) {
> + free(idmap);
> + }
> +
> + return -1;
> +}
> +
> static int config_path_item(const char *key, const char *value,
> struct lxc_conf *lxc_conf, char **conf_item)
> {
> diff --git a/src/lxc/start.c b/src/lxc/start.c
> index ccec9ef..be738c8 100644
> --- a/src/lxc/start.c
> +++ b/src/lxc/start.c
> @@ -581,6 +581,22 @@ static int do_start(void *data)
> if (lxc_sync_barrier_parent(handler, LXC_SYNC_CONFIGURE))
> return -1;
>
> + /*
> + * if we are in a new user namespace, become root there to have
> + * privilege over our namespace
> + */
> + if (!lxc_list_empty(&handler->conf->id_map)) {
> + NOTICE("switching to gid/uid 0 in new user namespace");
> + if (setgid(0)) {
> + SYSERROR("setgid");
> + goto out_warn_father;
> + }
> + if (setuid(0)) {
> + SYSERROR("setuid");
> + goto out_warn_father;
> + }
> + }
> +
> #if HAVE_SYS_CAPABILITY_H
> if (handler->conf->need_utmp_watch) {
> if (prctl(PR_CAPBSET_DROP, CAP_SYS_BOOT, 0, 0, 0)) {
> @@ -681,6 +697,10 @@ int lxc_spawn(struct lxc_handler *handler)
> return -1;
>
> handler->clone_flags = CLONE_NEWUTS|CLONE_NEWPID|CLONE_NEWIPC|CLONE_NEWNS;
> + if (!lxc_list_empty(&handler->conf->id_map)) {
> + INFO("Cloning a new user namespace");
> + handler->clone_flags |= CLONE_NEWUSER;
> + }
> if (!lxc_list_empty(&handler->conf->network)) {
>
> handler->clone_flags |= CLONE_NEWNET;
> @@ -747,6 +767,16 @@ int lxc_spawn(struct lxc_handler *handler)
> }
> }
>
> + /* map the container uids - the container became an invalid
> + * userid the moment it was cloned with CLONE_NEWUSER - this
> + * call doesn't change anything immediately, but allows the
> + * container to setuid(0) (0 being mapped to something else on
> + * the host) later to become a valid uid again */
> + if (lxc_map_ids(&handler->conf->id_map, handler->pid)) {
> + ERROR("failed to set up id mapping");
> + goto out_delete_net;
> + }
> +
> /* Tell the child to continue its initialization. we'll get
> * LXC_SYNC_CGROUP when it is ready for us to setup cgroups
> */
> @@ -772,6 +802,11 @@ int lxc_spawn(struct lxc_handler *handler)
> if (detect_shared_rootfs())
> umount2(handler->conf->rootfs.mount, MNT_DETACH);
>
> + /* If child is in a fresh user namespace, chown his ptys for
> + * him */
> + if (uid_shift_ttys(handler->pid, handler->conf))
> + DEBUG("Failed to chown ptys.\n");
> +
> if (handler->ops->post_start(handler, handler->data))
> goto out_abort;
>
>
--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 901 bytes
Desc: OpenPGP digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130115/f88983e4/attachment.pgp>
More information about the lxc-devel
mailing list