[lxc-devel] [PATCH] Support MS_SHARED /

Michael H. Warfield mhw at WittsEnd.com
Fri Dec 28 03:45:37 UTC 2012


On Thu, 2012-12-20 at 09:03 -0600, Serge Hallyn wrote:
> Quoting Stéphane Graber (stgraber at ubuntu.com):
> > On 12/20/2012 06:58 AM, Serge Hallyn wrote:
> ...
> > /proc/mounts in the container will also end up being polluted by all the
> > mount points from the host, this in itself doesn't cause any big
> > problem, though the container will try (and fail) to unmount all of those.
> > Is there anything we can do to improve that situation or is that a side
> > effect of MS_SHARED that we can't workaround on our end?

> I think it's actually a side effect of pivot-root after chroot.  You
> have /orig_root/foo/chroot_root/path/new_pivot/put_old.  Then you
> chroot to /orig_root/foo/chroot_root.  When you then pivot to
> /path/new_pivot, what ends up in put_old is /orig_root/foo/chroot_root.
> I'm actually not sure you can trim the mounts which were under
> /orig_root.  We could figure out ones they are by following the chain 
> of mount ids in /proc/self/mountinfo, but we can't reach them to umount
> them.

> It's much like how when you boot a livecd, you see things like
> the rootfs on / as well as /cow on /.  You can't reach the rootfs
> which is parent of the /cow on / any more, but it's in the mounts
> table.

> Now I tested, and with a simple setup we can use a much simpler
> patch which just does mount("", "/", NULL, MS_SLAVE|MS_REC, 0);
> for the whole of chroot_into_slave() (and skips the new umount2()
> in start.c).  The container then starts, and its mounts table
> is clean.

> Where that won't work is in a livecd or any fancy raid setup,
> where your process's / has a parent which is MS_SHARED.

> Michael, can you show me your /proc/self/mountinfo in a f18
> box?

Freshly installed clean box...

[root at dwarf52 mhw]# cat /proc/self/mountinfo
15 34 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
16 34 0:14 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
17 34 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=491520k,nr_inodes=122880,mode=755
18 16 0:15 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - securityfs securityfs rw
19 16 0:13 / /sys/fs/selinux rw,relatime shared:8 - selinuxfs selinuxfs rw
20 17 0:16 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
21 17 0:10 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
22 34 0:17 / /run rw,nosuid,nodev shared:19 - tmpfs tmpfs rw,seclabel,mode=755
23 16 0:18 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:9 - tmpfs tmpfs rw,seclabel,mode=755
24 23 0:19 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
25 23 0:20 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,cpuset
26 23 0:21 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,cpuacct,cpu
27 23 0:22 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,memory
28 23 0:23 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,devices
29 23 0:24 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,freezer
30 23 0:25 / /sys/fs/cgroup/net_cls rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,net_cls
31 23 0:26 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,blkio
32 23 0:27 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,perf_event
34 1 253:1 / / rw,relatime shared:1 - ext4 /dev/mapper/fedora_dwarf52-root rw,seclabel,data=ordered
35 15 0:29 / /proc/sys/fs/binfmt_misc rw,relatime shared:20 - autofs systemd-1 rw,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
37 16 0:30 / /sys/kernel/config rw,relatime shared:21 - configfs configfs rw
39 17 0:31 / /dev/hugepages rw,relatime shared:22 - hugetlbfs hugetlbfs rw,seclabel
38 17 0:12 / /dev/mqueue rw,relatime shared:23 - mqueue mqueue rw,seclabel
36 16 0:7 / /sys/kernel/debug rw,relatime shared:24 - debugfs debugfs rw
40 34 0:32 / /tmp rw shared:25 - tmpfs tmpfs rw,seclabel
41 34 8:1 / /boot rw,relatime shared:26 - ext4 /dev/sda1 rw,seclabel,data=ordered
42 34 253:2 / /home rw,relatime shared:27 - ext4 /dev/mapper/fedora_dwarf52-home rw,seclabel,data=ordered
74 22 0:33 / /run/user/1000/gvfs rw,nosuid,nodev,relatime shared:57 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1000,group_id=1000
76 16 0:34 / /sys/fs/fuse/connections rw,relatime shared:59 - fusectl fusectl rw

Looks like everything has "shared".

I'll be testing lxc on this beast with and without this patch over the
next couple of days for both systemd and non-systemd containers.  I've
got to get 0.9.0a2 built on it first and then go from there.

> > I didn't spend much time reviewing the code itself, but it applied to my
> > local staging tree and built fine, so that's good enough for me :)

> Thanks -  TBH the extra mounts are no more wrong than they are in
> a livecd, so I don't think it's a big problem.  One we can address
> in January.

> -serge

Hope you (and everyone else) had a nice holiday!

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20121227/73179abc/attachment.pgp>


More information about the lxc-devel mailing list