[lxc-devel] conceptual questions about user namespaces

Fri Apr 12 13:12:31 UTC 2013

Quoting Guido Jäkel (G.Jaekel at DNB.DE):
> First i want to to say that i didn't test this feature by myself up to now. But from reading the list, i have questions.
> 
> For me, the main usecases of the user namespace feature seems to be:
> 
> a) to "shift" the containers root user - a security driven term ("jailbreaking")
> b) to "shift" the containers "other users" - a privacy driven term ("data separation")

The part you don't list is important.  Privilege (capabilities) are
uid-agnostic.  Originally there was only 'am i root?'.  Then it
was 'am I capable(CAP_SYS_ADMIN)'.  Now, it is 'do I have capability
CAP_SYS_ADMIN toward user namespace X', represented as
ns_capable(X, CAP_SYS_ADMIN).  capable(x) is simply expanded to
ns_capable(&init_user_ns, x).

> with my bad English, i have no better words for this. The first one might be advisable for many scenarios; the second one is a good instrument if a set of containers is offered as a service to independent subadministrators.
> 
> 
> >From my understanding, from the kernel's point of view -- with is also the hosts point of view -- the user namespace feature is a uid/gid translation for an assigned process (and it's children). With a appropriate rule, particularly the container tasksets user 0/0 will act "in reality" as the user n/m. Or maybe it even better to imagine, that the taskset will be flamed to see n/m as 0/0.
> 
> Now, what i want to ask:
> 
> * The container may be have access to shared/outerwold resources. What happes with by-rule unmapped uid/gids? *Are* they passed unmapped, what one may call "transparent"? Or are they mapped to "nobody"? 

Ids are translated at the syscall (userspace-kernel) boundary, and an
unmappable uid is mapped to nobody.

> * What will happen in the usecase "real device reach though" and similar, e.g. if one want to provide not a veth but dedicated physical network adapter. Or, maybe more common, a videocard. Will the container root user have "root privileges" on it? Or is it neccessary to grant this privileges to the uid/gid n/m on the host, too?

Access to a network device is decided by capable(), not uid.  Capable()
to a network resource becomes capable(resource->netns->owner, cap).  So,
the resource is owned by a network namespace.  Every network namespace
is owned by the user namespace which created (through clone/unshare)
the network namespace.  Privilege (capable()) over a resource object
means privilege targeted toward the owning userns.

> * What will happen in the usecase "NFS V3 client". Here, the nfs server locally uses the uid/gid transmitted from client. Must one mount the nfs source on the host and bind-mount into the container to conserve the user namespace mapping? In the other hand, will a nfs mount inside the container skip this mapping?

AFAIK you can't yet nfs mount from inside a container.  If you could,
then it would be up to restrictions on the container's network to
restrict nfs access.  You won't be able to act as root on the host
network, so you'll be restricted by however the host admin is
willing to hook your container into the host network.

The other case of course is nfs-mounting from the host and bind
mounting the result into a container.  There uids will be
translated at the syscall (userspace-kernel) boundaries, so at
stat, open, etc.  Just as with any other bind mounts.

-serge