[lxc-devel] Kernel bug? Setuid apps and user namespaces

Fri Apr 4 19:10:00 UTC 2014

Quoting Andy Lutomirski (luto at amacapital.net):
> On Fri, Apr 4, 2014 at 11:30 AM, Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> > Quoting Andy Lutomirski (luto at amacapital.net):
> >> On 04/02/2014 10:32 AM, Serge E. Hallyn wrote:
> >> > (Sorry - the lxc-devel list has moved, so replying to all with the
> >> > correct list address;   please reply to this rather than my previous
> >> > email)
> >> >
> >> > Quoting Serge Hallyn (serge.hallyn at ubuntu.com):
> >> >> Hi Eric,
> >> >>
> >> >> (sorry, I don't seem to have the email I actually wanted to reply
> >> >> to in my mbox, but it is
> >> >> https://lists.linuxcontainers.org/pipermail/lxc-devel/2013-October/005857.html)
> >> >>
> >> >> You'd said,
> >> >>> Someone needs to read and think through all of the corner cases and see
> >> >>> if we can ever have a time when task_dumpable is false but root in the
> >> >>> container would not or should not be able to see everything.
> >> >>>
> >> >>> In particular I am worried about the case of a setuid app calling setns,
> >> >>> and entering a lesser privileged user namespace.  In my foggy mind that
> >> >>> might be a security problem.  And there might be other similar crazy
> >> >>> cases.
> >> >>
> >> >> Can we make use of current->mm->exe_file->f_cred->user_ns?
> >> >>
> >> >> So either always use
> >> >> make_kgid(current->mm->exe_file->f_cred->user_ns, 0)
> >> >> instead of make_kuid(cred->user_ns, 0), or check that
> >> >> (current->mm->exe_file->f_cred->user_ns == cred->user_ns)
> >> >> and, if not, assume that the caller has done a setns?
> >>
> >> Do you have a summary of the issue?  I'm a little lost here.
> >
> > Sure - when running an unprivileged container, tasks which become
> > !dumpable end up with /proc/$pid/fd/ being owned by the global
> > root user, which inside the container is nobody:nogroup.  Examples
> > are the user's sshd threads and apache, and in the past I think I've
> > seen it with logind or getty too.
> 
> Other than the aesthetics, why does this matter?  Things in the
> container who are actually mapped to nobody still can't access those
> files?

Bc root cannot look at the fds.

> The alternative (using the container's owner) sounds a bit scary.

If the file being run belongs to the container, why would it be scary?
Bc some fds may have been not closed when the task did execve, where
the previous bprm file may have been on the host?

> >> I suspect that what we really need is to revoke a bunch of proc files
> >> every time a task does anything involving setuid (or, more generally,
> >> any of the LSM_UNSAFE_PTRACE things).
> >
> > setuid, or do you mean setns?  In any case, I'm not thinking through
> > attach (setns'ing into a container) yet, but the cases I'm looking at
> > right now are just a root daemon - already inside the non-init user
> > ns - doing something to become !dumpable, and having its fds become
> > owned by GLOBAL_ROOT_UID.  Since these tasks are running a program
> > which came from inside the non-init userns, I think it's sane to
> > allow root in the non-init userns own any coredumps.
> >
> > Whereas if the program had started as /bin/passwd in the init userns,
> > then coredumps (and /proc/$$/fd/*) should be owned by the GLOBAL_ROOT_UID.
> 
> Gack.
> 
> This is kind of the same problem as the ptrace issue in the credfd
> thread.  Sigh.
> 
> --Andy