[lxc-devel] Kernel bug? Setuid apps and user namespaces

Eric W. Biederman ebiederm at xmission.com
Wed Oct 23 04:54:37 UTC 2013


Sean Pajot <sean.pajot at execulink.com> writes:

> On 10/22/2013 03:50 PM, Eric W. Biederman wrote:
>> Serge Hallyn <serge.hallyn at ubuntu.com> writes:
>> 
>>> Quoting Sean Pajot (sean.pajot at execulink.com):
>>>> I've been playing with User Namespaces somewhat extensively and I think I've
>>>> come across a bug in the handling of /proc/$PID/ entries.
>>>>
>>>> This is my example case on a 3.10.x kernel:
>>>>
>>>> -- /var/lib/lxc/test1/config
>>>>
>>>> lxc.rootfs = /lxc/c1
>>>> lxc.id_map = u 0 1000000 100000
>>>> lxc.id_map = g 0 1000000 100000
>>>> lxc.network.type = none
>>>>
>>>> lxc.tty = 6
>>>>
>>>> == END
>>>>
>>>> On one console login as a non-root user and run "su", as an example of a
>>>> setuid root application. On another console login as root and examine
>>>> /proc/$(pidof su). You'll find all the files are owned by the "nobody" user
>>>> and inaccessible. The reason is on the host you'll find these files are owned
>>>> by "root", uid 0, which is odd because in the container they should be uid
>>>> 1000000 from the mappings.
>>>>
>>>> I tracked down the cause to kernel source file /fs/proc/base.c function
>>>> pid_revalidate which contains static references to GLOBAL_ROOT_UID and
>>>> GLOBAL_ROOT_GID which are always UID 0 on the host. This little patch, which
>>>> might not be correct in terms of kernel standards, appears to mostly solve the
>>>> issue. It doesn't affect all entries in /proc/$PID but gets the majority of them.
>>>>
>>>> Thoughts or opinions?
>>>
>>> Awesome - I've seen this bug and so far not had time to dig.  
>>>
>>> The patch offhand looks good to me.  Do you mind sending it to
>>> lkml?
>
>
>>>
>>> Acked-by: Serge E. Hallyn <serge.hallyn at ubuntu.com>
>>>
>
> Well I wasn't expecting that... :)
>
>> 
>> It is definitely worth looking at.  I punted on this when I did the
>> initial round of conversions.  Tasks that we don't consider dumpable are
>> weird.
>> 
>> At first glance this fine.  However __task_cred does not return NULL so
>> handling that case is nonsense and confusing.
>> 
>> Eric
>> 
>
> I thought so, but I wanted to have a failsafe since I'm running this code on
> the same machine I'm typing this message on.
> This is my first patch that had a chance of making it into the kernel so I'm
> honestly making things up as I go. I put that there so in the event a NULL
> cred showed up there would be known symptoms besides an Oops.
>
> On my system I still have the "ns" directory marked as owned by host's uid 0
> but since the permissions are 511 (?) and the namespace objects are owned by
> container's uid 0 it doesn't really impact much. That could probably use
> fixing but the use cases are generally usable now.
>
> That aside, you really think it's okay for inclusion in the kernel with
> cred!=NULL fixed?

Someone needs to read and think through all of the corner cases and see
if we can ever have a time when task_dumpable is false but root in the
container would not or should not be able to see everything.

In particular I am worried about the case of a setuid app calling setns,
and entering a lesser privileged user namespace.  In my foggy mind that
might be a security problem.  And there might be other similar crazy
cases.

But the code itself looks good, and the bug hunting seems solid.

If my concerns about a setuid app calling setns are valid what we can
likely do with dumpable is record the kuid of the userns root when the
task becomes non-dumpable, and use that for i_uid and i_gid.

Eric

>>>> --- linux-3.10-clean/fs/proc/base.c	2013-06-30 18:13:29.000000000 -0400
>>>> +++ linux-3.10-patched/fs/proc/base.c	2013-10-22 13:28:22.561262197 -0400
>>>> @@ -1632,17 +1632,17 @@
>>>>  	task = get_proc_task(inode);
>>>>
>>>>  	if (task) {
>>>> +		rcu_read_lock();
>>>> +	        cred = __task_cred(task);
>>>>  		if ((inode->i_mode == (S_IFDIR|S_IRUGO|S_IXUGO)) ||
>>>>  		    task_dumpable(task)) {
>>>> -			rcu_read_lock();
>>>> -			cred = __task_cred(task);
>>>>  			inode->i_uid = cred->euid;
>>>>  			inode->i_gid = cred->egid;
>>>> -			rcu_read_unlock();
>>>>  		} else {
>>>> -			inode->i_uid = GLOBAL_ROOT_UID;
>>>> -			inode->i_gid = GLOBAL_ROOT_GID;
>>>> +			inode->i_uid = cred ? make_kuid(cred->user_ns, 0) : GLOBAL_ROOT_UID;
>>>> +			inode->i_gid = cred ? make_kgid(cred->user_ns, 0) : GLOBAL_ROOT_GID;
>>>>  		}
>>>> +		rcu_read_unlock();
>>>>  		inode->i_mode &= ~(S_ISUID | S_ISGID);
>>>>  		security_task_to_inode(task, inode);
>>>>  		put_task_struct(task);
>> 




More information about the lxc-devel mailing list