[lxc-devel] [PATCH 1/1] pivot_root: switch to a new mechanism (v2)

Andy Lutomirski luto at amacapital.net
Mon Sep 29 22:02:40 UTC 2014


On Mon, Sep 29, 2014 at 2:46 PM, Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> Quoting Andy Lutomirski (luto at amacapital.net):
>> On Mon, Sep 29, 2014 at 1:55 PM, Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
>> > Quoting Dwight Engen (dwight.engen at oracle.com):
>> >> On Sat, 20 Sep 2014 03:15:44 +0000
>> >> Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
>> >>
>> >> > This idea came from Andy Lutomirski.  Instead of using a
>> >> > temporary directory for the pivot_root put-old, use "." both
>> >> > for new-root and old-root.  Then fchdir into the old root
>> >> > temporarily in order to unmount the old-root, and finally
>> >> > chdir back into our '/'.
>> >> >
>> >> > Drop lxc.pivotdir from the lxc.container.conf manpage.
>> >> >
>> >> > Warn when we see a lxc.pivotdir entry (but keep it in the
>> >> > lxc.conf for now).
>> >> >
>> >> > Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
>> >>
>> >> Hey Serge
>> >>
>> >> Just a heads up that this change makes my containers not start on a
>> >> 3.8.13 kernel. Reverting this change or using 3.13.11 kernel (just
>> >> tried a newer one I had handy) works. I'll try to debug further.
>> >> The log from trying to start a busybox container:
>> >
>> > Fascinating.
>> >
>> > It seems this is the case from at least v3.2 to v3.11.
>> >
>> > If my systemtap experiment isn't completely wrong, do_mount's
>> > call to kern_path() for any path to mount is returning a path
>> > with mnt->mnt_ns = NULL.  But it doesn't return an error.  So
>> > later on, when do_loopback checks for the path's mnt_ns being
>> > the same as current's, that fails since mnt_ns is NULL.
>> >
>> > The easiest way to reproduce is to use the program below, call it
>> > 'pivot.c", build it, and run "sudo ./pivot", then in the resulting bash
>> > shell try to, say, "mount --bind /mnt /mnt".
>> >
>> > I've reached the end of my ability to system-tap, so I'm going
>> > to build a kernel with some printk'ing :) and continue in awhile.
>> >
>> > Note that editing the program to do pivot_root(".", "/mnt") and
>> > umount2("/", MNT_DETACH) does NOT cause a problem.
>> >
>> > ===================================================================
>> >
>> > #include <sys/types.h>
>> > #include <sys/stat.h>
>> > #include <fcntl.h>
>> > #include <stdio.h>
>> > #include <stdlib.h>
>> > #include <unistd.h>
>> > #include <sys/mount.h>
>> > #include <linux/sched.h>
>> >
>> >
>> > int main()
>> > {
>> >         int ret;
>> >         ret = unshare(CLONE_NEWNS);
>> >         if (ret)
>> >                 exit(1);
>> >         int oldroot, newroot;
>> >
>> >         ret = mount("/", "/mnt", "none", MS_BIND | MS_REC, NULL);
>> >         if (ret)
>> >                 exit(1);
>> >
>> >         oldroot = open("/", O_DIRECTORY | O_RDONLY);
>> >         if (oldroot < 0)
>> >                 exit(1);
>> >         newroot = open("/mnt", O_DIRECTORY | O_RDONLY);
>> >         if (newroot < 0)
>> >                 exit(1);
>> >         if (fchdir(newroot))
>> >                 exit(1);
>> >
>> >         if (pivot_root(".", "."))
>> >                 exit(2);
>> >         printf("pivoted");
>> >
>> >         if (fchdir(oldroot) < 0)
>> >                 exit(3);
>> >         if (umount2("/", MNT_DETACH) < 0)
>> >                 exit(3);
>>
>> Shouldn't that be umount2(".", MNT_DETACH)?
>
> No, unless I'm misreading the state at this point:
>
> I want / to be umounted, . is the oldroot.  We've got
>
> "." = oldroot but not referencable from / (and not pinning it)
>
> "/" is the oldroot mounted over the newroot.

I'm not sure that "/" is well-defined.  You have oldroot mounted on
top of newroot, and "/" refers to one of them (presumably oldroot on
newer kernels, and maybe newroot on older kernels).  I think that you
want to unmount oldroot, leaving only newroot mounted.  When you call
umount2, "." reliably refers to oldroot.

/me wonders whether there's a vulnerability here on new kernels if the
test were adjusted a bit.  mnt_ns oughtn't to be NULL, right?

>
>> I'm currently having trouble finding an old enough box.  Can you try
>> the attached fancier test and see what it prints?
>
> Exact same as mine:
>
> ubuntu at kvm-p3:~$ sudo ./x
> pivoted
> in new root
> I am 1441
> root at kvm-p3:/# mount --bind /mnt /mnt

Ah, OK, I completely misunderstood your original email.

If I change umount2 to umount "." instead of "/" in my code, the
subsequent mount --bind works for me on 3.2.

FWIW, your test does awful, awful things if I don't do the MS_PRIVATE
thing on top.

--Andy


More information about the lxc-devel mailing list