[lxc-devel] [PATCH 1/1] lxc_attach: fix break with user namespaces (v2)

Tue Jan 22 02:15:30 UTC 2013

Quoting Christian Seiler (christian at iwakd.de):
> Hi Serge,
> 
> Just a few quick comments because I'm very interested in the lxc-attach
> utility:
> 
> > +			ret = lxc_cgroup_prepare_attach(my_args.name, &cgroup_data);
> > +			if (ret < 0) {
> > +				ERROR("failed to prepare attaching to cgroup");
> > +				return -1;
> > +			}
> > +
> > +			ret = lxc_cgroup_finish_attach(cgroup_data, gchild);
> > +			if (ret < 0) {
> > +				ERROR("failed to attach process to cgroup");
> > +				return -1;
> > +			}
> 
> Note that I made the whole cgroup attach logic so complicated a while
> back (i.e. prepare before setns/fork -> finish/dispose after fork),
> precisely so that one wouldn't need a second fork and so I didn't have

Right, that wasn't lost on me as I was doing this :)

> to play around with IPC to get the PID of the process that is to be
> added to the cgroup. If an additional fork is needed anyway due to user
> namespaces (that reminds me that I should definitely try them out...),
> the reason for making the cgroup attach logic so complicated disappears
> and one could return to the direct approach from before, probably makes
> it quite a bit easier to read.

Agreed.  We should move that back out of attach.c.

> Side note 1: that I would use pid_t and not int as the data type of the
> object sent through the pipe, that seems more portable to me.
> 
> Side note 2: Idea, but untested and not completely thought through, just
> wanted to put it out there: The "middle" process in your logic need not
> necessarily be kept around if the kernel version is at least 3.4 -
> because Linux then supports prctl(PR_SET_CHILD_SUBREAPER), which - if
> I'm not completely mistaken - would reparent the "inner" process (that
> will execve() to the requested program inside the container) to the
> "outer" attach process after the "middle" process exits. That way we
> might keep only one event loop around. On the other hand, this still
> requires both implementations if kernels < 3.4 are still to be
> supported... (as I said, not completely thought through, just wanted to
> put the idea out there)
> 
> > +		return -1;
> > +
> > +		return 0;
> 
> This seems a bit weird... ;-)
> 
> > +			/* XXX FIXME this should get the uid of the container init and setuid to that */
> > +			/* XXX FIXME or perhaps try to map in the lxc-attach caller's uid? */
> 
> I believe a sane default would be the first option (uid of init, on the
> other hand, that isn't so easy to get by, because one has to assume
> /proc is mounted and certain security features are turned off) and let
> the user specify a uid otherwise.

Hm, i didn't really think mounted /proc on the host was all that
optional for containers anyway.  We could always just avoid the setuid
if we find /proc unmounted.

Would you care to update the patch along these lines?

-serge