[lxc-devel] [PATCH 1/1] lxc_attach: fix break with user namespaces (v2)

Mon Jan 21 22:31:16 UTC 2013

Hi Serge,

Just a few quick comments because I'm very interested in the lxc-attach
utility:

> +			ret = lxc_cgroup_prepare_attach(my_args.name, &cgroup_data);
> +			if (ret < 0) {
> +				ERROR("failed to prepare attaching to cgroup");
> +				return -1;
> +			}
> +
> +			ret = lxc_cgroup_finish_attach(cgroup_data, gchild);
> +			if (ret < 0) {
> +				ERROR("failed to attach process to cgroup");
> +				return -1;
> +			}

Note that I made the whole cgroup attach logic so complicated a while
back (i.e. prepare before setns/fork -> finish/dispose after fork),
precisely so that one wouldn't need a second fork and so I didn't have
to play around with IPC to get the PID of the process that is to be
added to the cgroup. If an additional fork is needed anyway due to user
namespaces (that reminds me that I should definitely try them out...),
the reason for making the cgroup attach logic so complicated disappears
and one could return to the direct approach from before, probably makes
it quite a bit easier to read.

Side note 1: that I would use pid_t and not int as the data type of the
object sent through the pipe, that seems more portable to me.

Side note 2: Idea, but untested and not completely thought through, just
wanted to put it out there: The "middle" process in your logic need not
necessarily be kept around if the kernel version is at least 3.4 -
because Linux then supports prctl(PR_SET_CHILD_SUBREAPER), which - if
I'm not completely mistaken - would reparent the "inner" process (that
will execve() to the requested program inside the container) to the
"outer" attach process after the "middle" process exits. That way we
might keep only one event loop around. On the other hand, this still
requires both implementations if kernels < 3.4 are still to be
supported... (as I said, not completely thought through, just wanted to
put the idea out there)

> +		return -1;
> +
> +		return 0;

This seems a bit weird... ;-)

> +			/* XXX FIXME this should get the uid of the container init and setuid to that */
> +			/* XXX FIXME or perhaps try to map in the lxc-attach caller's uid? */

I believe a sane default would be the first option (uid of init, on the
other hand, that isn't so easy to get by, because one has to assume
/proc is mounted and certain security features are turned off) and let
the user specify a uid otherwise.

Btw. I noticed recently that if the glibc/nss implementations of host
and container are incompatible, even a plain lxc-attach without
specifying /bin/sh won't work since the lxc-attach code running that
tries to determine the login shell comes from the host's glibc and the
nss module loaded comes from the container. (There probably should be a
default of /bin/sh in that case instead of failure.) That same issue
will come to bite even harder if lxc-attach tries to lookup a user name
that the user has specified in order to map it to a given user id... I
don't see an easy solution for this in general... (Other than to simply
say: never do nss lookups from lxc-attach, i.e. only use numerical ids,
default for /bin/sh for the shell and let the user specify otherwise if
wanted.)

Regards,
Christian