[lxc-devel] [PATCH] Support MS_SHARED /

Michael H. Warfield mhw at WittsEnd.com
Sat Jan 5 21:46:15 UTC 2013


Hey Serge!

Took longer for me to test this out on Fedora 18 Beta than I had
expected.  I got tangled up trying to get bridge networking working and
my day job wanted to get in my way...  :-P  I hear down that F18 final
has been delayed again but anticipated for Jan 15.  I'll test that when
it becomes available.

IAC...  I was able to confirm that the 0.9.0.a2 cut very definitely
fails on an F18Beta host with the expected pivot root error and that the
code in staging does seem to work and seems to do the right thing.  This
was starting an F17 container on an F18Beta host with autodev enabled
and systemd 195 running in the container.

I did notice a huge "pile" of MAKEDEV errors creating tty devices when I
ran lxc-start like these:

-- 
MAKEDEV: /dev/ttyEQ1001: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1002: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1003: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1004: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1005: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1006: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1007: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1008: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1009: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1010: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1011: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1012: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1013: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1014: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1015: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1016: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1017: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1018: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1019: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1020: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1021: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1022: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1023: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1024: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1025: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1026: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyEQ1027: unable to set file creation context " system_u:object_r:tty_device_t:s0"
MAKEDEV: /dev/ttyUB0: unable to set file creation context " system_u:object_r:device_t:s0"
MAKEDEV: /dev/ttyUB1: unable to set file creation context " system_u:object_r:device_t:s0"
<30>systemd[1]: systemd 195 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ; fedora)
<30>systemd[1]: Detected virtualization 'lxc'.

Welcome to Fedora 17 (Beefy Miracle)!

<30>systemd[1]: Set hostname to <alcove.wittsend.com>.
<28>systemd[1]: Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory. See system logs and 'systemctl status display-manager.service' for details.
<30>systemd[1]: Started Collect Read-Ahead Data.
<30>systemd[1]: Started Replay Read-Ahead Data.
<30>systemd[1]: Starting Forward Password Requests to Wall Directory Watch.
<30>systemd[1]: Started Forward Password Requests to Wall Directory Watch.
<30>systemd[1]: Starting Syslog Socket.
[  OK  ] Listening on Syslog Socket.
-- 

It certainly appears to have done the right thing and that same
container on an F17 host does not emit those MAKEDEV errors and does not
contain those tty devices.  Looks like an selinux issue inside the
container.  But it's happening even when I set selinux to "permissive"
mode in both the host and container.  Seems cosmetic, however.  Nothing
showing up in the syslog messages file on either the host or the
container.

I see a call to "/sbin/MAKEDEV console" in src/lxc/conf.c.  Not sure if
it's that call that's generating the problem but there is no MAKEDEV in
the container.  It's interesting that they're showing up before systemd
in the container is announcing its presence.  Looks like it's running
the MAKEDEV command in the host environment and, if I run "MAKEDEV
console" in the host itself, I get a couple thousand of those tty
devices created in the host /dev, that were not present before, and I
don't get any of the context errors...  Might be worth looking into just
to see what all the noise is all about.

IAC...  Looks like it works on F18Beta.  I'm good.

Regards,
Mike

On Thu, 2012-12-27 at 22:45 -0500, Michael H. Warfield wrote:
> On Thu, 2012-12-20 at 09:03 -0600, Serge Hallyn wrote:
> > Quoting Stéphane Graber (stgraber at ubuntu.com):
> > > On 12/20/2012 06:58 AM, Serge Hallyn wrote:
> > ...
> > > /proc/mounts in the container will also end up being polluted by all the
> > > mount points from the host, this in itself doesn't cause any big
> > > problem, though the container will try (and fail) to unmount all of those.
> > > Is there anything we can do to improve that situation or is that a side
> > > effect of MS_SHARED that we can't workaround on our end?

> > I think it's actually a side effect of pivot-root after chroot.  You
> > have /orig_root/foo/chroot_root/path/new_pivot/put_old.  Then you
> > chroot to /orig_root/foo/chroot_root.  When you then pivot to
> > /path/new_pivot, what ends up in put_old is /orig_root/foo/chroot_root.
> > I'm actually not sure you can trim the mounts which were under
> > /orig_root.  We could figure out ones they are by following the chain 
> > of mount ids in /proc/self/mountinfo, but we can't reach them to umount
> > them.

> > It's much like how when you boot a livecd, you see things like
> > the rootfs on / as well as /cow on /.  You can't reach the rootfs
> > which is parent of the /cow on / any more, but it's in the mounts
> > table.
> 
> > Now I tested, and with a simple setup we can use a much simpler
> > patch which just does mount("", "/", NULL, MS_SLAVE|MS_REC, 0);
> > for the whole of chroot_into_slave() (and skips the new umount2()
> > in start.c).  The container then starts, and its mounts table
> > is clean.
> 
> > Where that won't work is in a livecd or any fancy raid setup,
> > where your process's / has a parent which is MS_SHARED.
> 
> > Michael, can you show me your /proc/self/mountinfo in a f18
> > box?
> 
> Freshly installed clean box...
> 
> [root at dwarf52 mhw]# cat /proc/self/mountinfo
> 15 34 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 16 34 0:14 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
> 17 34 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=491520k,nr_inodes=122880,mode=755
> 18 16 0:15 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:7 - securityfs securityfs rw
> 19 16 0:13 / /sys/fs/selinux rw,relatime shared:8 - selinuxfs selinuxfs rw
> 20 17 0:16 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
> 21 17 0:10 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> 22 34 0:17 / /run rw,nosuid,nodev shared:19 - tmpfs tmpfs rw,seclabel,mode=755
> 23 16 0:18 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:9 - tmpfs tmpfs rw,seclabel,mode=755
> 24 23 0:19 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup rw,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 25 23 0:20 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup rw,cpuset
> 26 23 0:21 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup rw,cpuacct,cpu
> 27 23 0:22 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup rw,memory
> 28 23 0:23 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup rw,devices
> 29 23 0:24 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup rw,freezer
> 30 23 0:25 / /sys/fs/cgroup/net_cls rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,net_cls
> 31 23 0:26 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup rw,blkio
> 32 23 0:27 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup rw,perf_event
> 34 1 253:1 / / rw,relatime shared:1 - ext4 /dev/mapper/fedora_dwarf52-root rw,seclabel,data=ordered
> 35 15 0:29 / /proc/sys/fs/binfmt_misc rw,relatime shared:20 - autofs systemd-1 rw,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
> 37 16 0:30 / /sys/kernel/config rw,relatime shared:21 - configfs configfs rw
> 39 17 0:31 / /dev/hugepages rw,relatime shared:22 - hugetlbfs hugetlbfs rw,seclabel
> 38 17 0:12 / /dev/mqueue rw,relatime shared:23 - mqueue mqueue rw,seclabel
> 36 16 0:7 / /sys/kernel/debug rw,relatime shared:24 - debugfs debugfs rw
> 40 34 0:32 / /tmp rw shared:25 - tmpfs tmpfs rw,seclabel
> 41 34 8:1 / /boot rw,relatime shared:26 - ext4 /dev/sda1 rw,seclabel,data=ordered
> 42 34 253:2 / /home rw,relatime shared:27 - ext4 /dev/mapper/fedora_dwarf52-home rw,seclabel,data=ordered
> 74 22 0:33 / /run/user/1000/gvfs rw,nosuid,nodev,relatime shared:57 - fuse.gvfsd-fuse gvfsd-fuse rw,user_id=1000,group_id=1000
> 76 16 0:34 / /sys/fs/fuse/connections rw,relatime shared:59 - fusectl fusectl rw
> 
> Looks like everything has "shared".
> 
> I'll be testing lxc on this beast with and without this patch over the
> next couple of days for both systemd and non-systemd containers.  I've
> got to get 0.9.0a2 built on it first and then go from there.
> 
> > > I didn't spend much time reviewing the code itself, but it applied to my
> > > local staging tree and built fine, so that's good enough for me :)
> 
> > Thanks -  TBH the extra mounts are no more wrong than they are in
> > a livecd, so I don't think it's a big problem.  One we can address
> > in January.
> 
> > -serge
> 
> Hope you (and everyone else) had a nice holiday!
> 
> Regards,
> Mike
> ------------------------------------------------------------------------------
> Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
> much more. Get web development skills now with LearnDevNow -
> 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
> SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122812
> _______________________________________________ Lxc-devel mailing list Lxc-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel

-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130105/6adf7cc0/attachment.pgp>


More information about the lxc-devel mailing list