[lxc-devel] [RFC 0/8] Unprivileged container creation and use

Serge Hallyn serge at mail.hallyn.com
Fri Jul 19 14:26:47 UTC 2013


With this patchset, I am able to create and start an ubuntu-cloud
container completely as an unprivileged user, on an ubuntu saucy
host with the kernel from ppa:ubuntu-lxc/kernel and the nsexec
package from ppa:serge-hallyn/userns-natty.

The one thing still completely unimplemented is networking.  I am
creating containers with lxc.network.type=empty to work around this.
Once the rest of this settles down, I'll address that.

lxc-destroy has not yet been updated, so right now the easiest way
to delete these containers is as root.  lxc-console and lxc-stop do
work as expected.

====================
Prerequisities:
====================

1. A privileged user or init script needs to create
	/run/lock/lxc/$HOME
and set perms so $USER can create locks there.

2. Before starting the container you'll need to be in a cgroup you
can manipulate.  I do this with:

#!/bin/sh
name=`whoami`
for d in /sys/fs/cgroup/*; do
	sudo mkdir $d/$name
	sudo chown -R $name $d/$name
done
echo 0 | sudo tee -a /sys/fs/cgroup/cpuset/$name/cpuset.cpus
echo 0 | sudo tee -a /sys/fs/cgroup/cpuset/$name/cpuset.mems

followed by:

cgroup_enter() {
	name=`whoami`
	for d in /sys/fs/cgroup/*; do
		echo $$  > $d/$name/cgroup.procs
	done
}

3. You need to give your user some subuids.  If you're creating a
new saucy system to use this on, then you already have some - check
/etc/subuids.  If not, then add some using "usermod -w 100000-299999
-v 100000-299999 $user"

4. I copied the ubuntu-cloud tarball into my home directory since the
template won't be able to download it without changes:

	sudo lxc-create -t ubuntu-cloud -n cloud1 -- -r precise
	cp /var/cache/lxc/cloud-precise/*.gz ~/precise.tar.gz

5. I created a configuration file to set the id mapping and set the
empty network.

	cat > ~/default.conf << EOF
lxc.network.type = empty
lxc.id_map = u 0 100000 10000
lxc.id_map = g 0 100000 10000
EOF

6. Finally, so that the ubuntu-cloud container will fully start up
without networking (just a 2 minute delay waiting in vain for eth0
to come up), copy the lxc-ubuntu-cloud template to your home directory
and sed -i 's/nocloud-net/nocloud/' on it, then call that template
(using the full path) to create the container.

====================
Commands:
====================

The actual commands I use are:

lxc-create -t /home/serge/lxc-ubuntu-cloud -P /home/serge/lxcbase -n x3 -f default.conf -- -T precise.tar.gz
lxc-start -P /home/serge/lxcbase -n x3

And voila, it waits 2 minutes for eth0, then gives me a prompt at
which I can login.

====================
Explanations:
====================

When you create a new user namespace, it initially is unmapped.  Your task
has uid and gid -1.  You can then map userids from the parent namespace onto
userids in the new namespace by ranges.  For instance if you are userid 1000,
then you can map uid 1000 in the parent to uid 0 in the namespace.  From the
kernel's point of view, you can only map uids which you have privilege over -
either by being that uid, or having CAP_SYS_ADMIN in the parent.

This is where subuids come in.  /etc/subuids and /etc/subgids list range of
uids which users are allowed to map.  The newuidmap and newgidmap are setuid-root
programs which will respect those subuids to allow unprivileged users to map
their allotted subuids.

Lxc uses these programs (indirectly through the 'usernsexec' program) to allow
unprivileged users to map their allotted subuids to containers.

Of note is that regular DAC and MAC remain unchanged.  Therefore although I
as user serge/uid 1000 may have 100000-199999 as my subuids, I do not own
files owned by those subuids!  To work around this, map the uids together into
a namespace.  For instance, if you are uid 1000 and want to create a file owned
by uid 100000, you can

	touch foo
	usernsexec -m b:0:100000:1 -m b:1000:1000 -- /bin/chown 0 foo

This maps 100000 on the host to root in the container, and 1000 to 1000
in the container.  So 100000 has privilege over the namespace.  It is
therefore allowed to chown a file owned by uid 1000 (which is really 1000)
to uid 0 (which is really 100000).  You end up with foo owned by
100000:100000.  You can do the same sort of games to clean up containers.





More information about the lxc-devel mailing list