[lxc-devel] [PATCH 1/1] pivot_root: switch to a new mechanism (v2)

Andy Lutomirski luto at amacapital.net
Mon Sep 29 23:21:48 UTC 2014


On Mon, Sep 29, 2014 at 4:13 PM, Andy Lutomirski <luto at amacapital.net> wrote:
> On Mon, Sep 29, 2014 at 4:07 PM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
>> Andy Lutomirski <luto at amacapital.net> writes:
>>
>>> On Mon, Sep 29, 2014 at 3:46 PM, Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
>>>> Quoting Andy Lutomirski (luto at amacapital.net):
>>>>> On Mon, Sep 29, 2014 at 2:46 PM, Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
>>>>> I'm not sure that "/" is well-defined.  You have oldroot mounted on
>>>>
>>>> Whoa.  Seems you're right.  I would have expected it to mean precisely
>>>> the dentry+vfsmount which I pivot-rooted to.  Which have been overmounted,
>>>> so umount(/) would umount what's been mounted over them.
>>>>
>>>>> top of newroot, and "/" refers to one of them (presumably oldroot on
>>>>> newer kernels, and maybe newroot on older kernels).
>>>>
>>>> So it seems.
>>>>
>>>>>I think that you
>>>>> want to unmount oldroot, leaving only newroot mounted.  When you call
>>>>> umount2, "." reliably refers to oldroot.
>>>>
>>>> Right
>>>>
>>>>> /me wonders whether there's a vulnerability here on new kernels if the
>>>>> test were adjusted a bit.  mnt_ns oughtn't to be NULL, right?
>>>>
>>>> Wouldn't it be in the older kernels though?  That's where mnt_ns ends
>>>> up being null.  So from 3.8..3.11 an unpriv user (though CLONE_NEWUSER)
>>>> can do a pivot_root causing null MNT_NS, and presumably find an interesting
>>>> way to dereference it.
>>>
>>> Eric?
>>>
>>> I wonder what happens if you unmount new_root on new kernels...
>>
>> There is chroot_fs_refs so it is clear that "/" is well defined after
>> pivot_root.  I thought that expensive loop over all of the tasks
>> had been removed at some put but it got hidden in an innocuous function
>> call instead. :(
>>
>>
>> As I recall what happens when you unmount "/" is that you get into a
>> very weird state where.  chroot_fs_refs isn't called so you have a case
>> where "/" refers to a lazily unmounted filesystem or the unmount
>> implicitly becomes a remount read-only.  Which smells like a userns
>> permission bug.

My initial attempt to make it blow up failed.  But IIRC the really
weird one was mounting anything on "/" -- you end up with pwd, root,
and ns root being hopelessly out of sync.

>>
>> I am looking at this related issue at the moment.
>> https://github.com/avagin/userns_vs_mntns
>
> To me, this smells like MNT_DETACH does something awful when there are
> mounts under the detached mount.
>
> For example:
>
> mount --rbind / /mnt
> umount -l /mnt
>
> does *not* end well on my system.  I find it hard to believe that this
> behavior is intentional.

By which I mean that this unmounts the world:

# mount --make-rshared /
# mount --rbind / /mnt
# umount -l /mnt

Dunno whether it's related.

--Andy


More information about the lxc-devel mailing list