[lxc-devel] CLONE_PARENT after setns(CLONE_NEWPID)

Wed Nov 6 19:50:59 UTC 2013

On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg at redhat.com> wrote:
> Hi Serge,
>
> On 11/06, Serge Hallyn wrote:
>>
>> Hi Oleg,
>>
>> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
>> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
>> breaks lxc-attach in 3.12.  That code forks a child which does
>> setns() and then does a clone(CLONE_PARENT).  That way the
>> grandchild can be in the right namespaces (which the child was
>> not) and be a child of the original task, which is the monitor.
>
> Thanks...
>
> Yes, this is what 40a0d32d1ea explicitly tries to disallow.
>
>> Is there a real danger in allowing CLONE_PARENT
>> when current->nsproxy->pidns_for_children is not our pidns,
>> or was this done out of an "over-abundance of caution"?
>
> I am not sure... This all was based on the long discussion, and
> it was decided that the CLONE_PARENT check should be consistent
> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns().
>
>> Can we
>> safely revert that new extra check?
>
> Well, usually we do not break user-space, but I am not sure about
> this case...

Presumably if we allow this, then we should also allow
clone(CLONE_NEWPID | CLONE_PARENT).  This seems a little odd, but off
the top of my head it doesn't seem obviously dangerous.

(Why were we worried about this in the first place?  The comment says
that we don't want signal handlers or thread groups to span
namespaces, but CLONE_PARENT has nothing to do with that.)

I feel like I'm rehashing something ancient, but shouldn't that code just be:

if (clone_flags & CLONE_VM) {
  // check for unsharing namespaces

with an update to the comment that CLONE_THREAD and CLONE_SIGHAND both
require CLONE_VM.

--Andy