[lxc-devel] [PATCH] Support MS_SHARED /
Michael H. Warfield
mhw at WittsEnd.com
Sun Jan 6 03:15:43 UTC 2013
On Sun, 2013-01-06 at 06:39 +0800, Alexander Vladimirov wrote:
> It is a separate package in Arch Linux and I dont have it installed on
> the host, as well as in container since everything works well without
> it
Well, that would explain it. What isn't explained is why we need it.
This is the run_makedev() function which is called from setup_autodev()
in src/lxc/setup.c just before it tries to populate the .../dev
directory in the container. There's some comments in there about making
sure the /dev/vcs* entries are created.
It's also not clear to me if it's even doing what it perports to do. It
changes to the dev directory and then runs /sbin/MAKEDEV (without
checking if it even exists) without a parameter (-d) for the target
directory which would seem to me to cause MAKEDEV to attempt to create
the devices in the host /dev and not the container .../dev directory at
all. That actually appears consistent with the behavior I'm seeing. If
I reboot the host system, all those tty devices do not exist in the host
until after I fire up a container with autodev enabled. Then they
appear in the host /dev which is not the correct behavior.
I don't think we should be doing this but this is part of the earlier
autodev patches Serge did for systemd that went into 9.0.0.a1. Maybe
it's a difference in behavior between MAKEDEV on Ubuntu vs MAKEDEV on
Fedora (et al) and not even guaranteed to exist.
Serge?
Regards,
Mike
> 2013/1/6 Michael H. Warfield <mhw at wittsend.com>:
> > On Sun, 2013-01-06 at 06:31 +0800, Alexander Vladimirov wrote:
> >> I can confirm it works for Arch Linux with systemd 196
> >> However I see exactly one message saying:
> >> sh: /sbin/MAKEDEV: No such file or directory
> >
> > Do you have /sbin/MAKEDEV in the host system? If not, that would make
> > sense. I'm not sure what it's suppose to be doing in lxc.
> >
> > Regards,
> > Mike
> >
> >> 2013/1/6 Michael H. Warfield <mhw at wittsend.com>
> >> Hey Serge!
> >>
> >> Took longer for me to test this out on Fedora 18 Beta than I
> >> had
> >> expected. I got tangled up trying to get bridge networking
> >> working and
> >> my day job wanted to get in my way... :-P I hear down that
> >> F18 final
> >> has been delayed again but anticipated for Jan 15. I'll test
> >> that when
> >> it becomes available.
> >>
> >> IAC... I was able to confirm that the 0.9.0.a2 cut very
> >> definitely
> >> fails on an F18Beta host with the expected pivot root error
> >> and that the
> >> code in staging does seem to work and seems to do the right
> >> thing. This
> >> was starting an F17 container on an F18Beta host with autodev
> >> enabled
> >> and systemd 195 running in the container.
> >>
> >> I did notice a huge "pile" of MAKEDEV errors creating tty
> >> devices when I
> >> ran lxc-start like these:
> >>
> >> --
> >> MAKEDEV: /dev/ttyEQ1001: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1002: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1003: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1004: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1005: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1006: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1007: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1008: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1009: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1010: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1011: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1012: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1013: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1014: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1015: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1016: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1017: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1018: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1019: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1020: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1021: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1022: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1023: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1024: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1025: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1026: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyEQ1027: unable to set file creation context "
> >> system_u:object_r:tty_device_t:s0"
> >> MAKEDEV: /dev/ttyUB0: unable to set file creation context "
> >> system_u:object_r:device_t:s0"
> >> MAKEDEV: /dev/ttyUB1: unable to set file creation context "
> >> system_u:object_r:device_t:s0"
> >> <30>systemd[1]: systemd 195 running in system mode. (+PAM
> >> +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT
> >> +ACL +XZ; fedora)
> >> <30>systemd[1]: Detected virtualization 'lxc'.
> >>
> >> Welcome to Fedora 17 (Beefy Miracle)!
> >>
> >> <30>systemd[1]: Set hostname to <alcove.wittsend.com>.
> >> <28>systemd[1]: Cannot add dependency job for unit
> >> display-manager.service, ignoring: Unit
> >> display-manager.service failed to load: No such file or
> >> directory. See system logs and 'systemctl status
> >> display-manager.service' for details.
> >> <30>systemd[1]: Started Collect Read-Ahead Data.
> >> <30>systemd[1]: Started Replay Read-Ahead Data.
> >> <30>systemd[1]: Starting Forward Password Requests to Wall
> >> Directory Watch.
> >> <30>systemd[1]: Started Forward Password Requests to Wall
> >> Directory Watch.
> >> <30>systemd[1]: Starting Syslog Socket.
> >> [ OK ] Listening on Syslog Socket.
> >> --
> >>
> >> It certainly appears to have done the right thing and that
> >> same
> >> container on an F17 host does not emit those MAKEDEV errors
> >> and does not
> >> contain those tty devices. Looks like an selinux issue inside
> >> the
> >> container. But it's happening even when I set selinux to
> >> "permissive"
> >> mode in both the host and container. Seems cosmetic,
> >> however. Nothing
> >> showing up in the syslog messages file on either the host or
> >> the
> >> container.
> >>
> >> I see a call to "/sbin/MAKEDEV console" in src/lxc/conf.c.
> >> Not sure if
> >> it's that call that's generating the problem but there is no
> >> MAKEDEV in
> >> the container. It's interesting that they're showing up
> >> before systemd
> >> in the container is announcing its presence. Looks like it's
> >> running
> >> the MAKEDEV command in the host environment and, if I run
> >> "MAKEDEV
> >> console" in the host itself, I get a couple thousand of those
> >> tty
> >> devices created in the host /dev, that were not present
> >> before, and I
> >> don't get any of the context errors... Might be worth looking
> >> into just
> >> to see what all the noise is all about.
> >>
> >> IAC... Looks like it works on F18Beta. I'm good.
> >>
> >> Regards,
> >> Mike
> >>
> >> On Thu, 2012-12-27 at 22:45 -0500, Michael H. Warfield wrote:
> >> > On Thu, 2012-12-20 at 09:03 -0600, Serge Hallyn wrote:
> >> > > Quoting Stéphane Graber (stgraber at ubuntu.com):
> >> > > > On 12/20/2012 06:58 AM, Serge Hallyn wrote:
> >> > > ...
> >> > > > /proc/mounts in the container will also end up being
> >> polluted by all the
> >> > > > mount points from the host, this in itself doesn't cause
> >> any big
> >> > > > problem, though the container will try (and fail) to
> >> unmount all of those.
> >> > > > Is there anything we can do to improve that situation or
> >> is that a side
> >> > > > effect of MS_SHARED that we can't workaround on our end?
> >>
> >> > > I think it's actually a side effect of pivot-root after
> >> chroot. You
> >> > > have /orig_root/foo/chroot_root/path/new_pivot/put_old.
> >> Then you
> >> > > chroot to /orig_root/foo/chroot_root. When you then pivot
> >> to
> >> > > /path/new_pivot, what ends up in put_old
> >> is /orig_root/foo/chroot_root.
> >> > > I'm actually not sure you can trim the mounts which were
> >> under
> >> > > /orig_root. We could figure out ones they are by
> >> following the chain
> >> > > of mount ids in /proc/self/mountinfo, but we can't reach
> >> them to umount
> >> > > them.
> >>
> >> > > It's much like how when you boot a livecd, you see things
> >> like
> >> > > the rootfs on / as well as /cow on /. You can't reach the
> >> rootfs
> >> > > which is parent of the /cow on / any more, but it's in the
> >> mounts
> >> > > table.
> >> >
> >> > > Now I tested, and with a simple setup we can use a much
> >> simpler
> >> > > patch which just does mount("", "/", NULL, MS_SLAVE|
> >> MS_REC, 0);
> >> > > for the whole of chroot_into_slave() (and skips the new
> >> umount2()
> >> > > in start.c). The container then starts, and its mounts
> >> table
> >> > > is clean.
> >> >
> >> > > Where that won't work is in a livecd or any fancy raid
> >> setup,
> >> > > where your process's / has a parent which is MS_SHARED.
> >> >
> >> > > Michael, can you show me your /proc/self/mountinfo in a
> >> f18
> >> > > box?
> >> >
> >> > Freshly installed clean box...
> >> >
> >> > [root at dwarf52 mhw]# cat /proc/self/mountinfo
> >> > 15 34 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 -
> >> proc proc rw
> >> > 16 34 0:14 / /sys rw,nosuid,nodev,noexec,relatime shared:6 -
> >> sysfs sysfs rw,seclabel
> >> > 17 34 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs
> >> rw,seclabel,size=491520k,nr_inodes=122880,mode=755
> >> > 18 16 0:15 / /sys/kernel/security
> >> rw,nosuid,nodev,noexec,relatime shared:7 - securityfs
> >> securityfs rw
> >> > 19 16 0:13 / /sys/fs/selinux rw,relatime shared:8 -
> >> selinuxfs selinuxfs rw
> >> > 20 17 0:16 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs
> >> rw,seclabel
> >> > 21 17 0:10 / /dev/pts rw,nosuid,noexec,relatime shared:4 -
> >> devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> >> > 22 34 0:17 / /run rw,nosuid,nodev shared:19 - tmpfs tmpfs
> >> rw,seclabel,mode=755
> >> > 23 16 0:18 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:9
> >> - tmpfs tmpfs rw,seclabel,mode=755
> >> > 24 23 0:19 / /sys/fs/cgroup/systemd
> >> rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup
> >> rw,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> >> > 25 23 0:20 / /sys/fs/cgroup/cpuset
> >> rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup
> >> rw,cpuset
> >> > 26 23 0:21 / /sys/fs/cgroup/cpu,cpuacct
> >> rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup
> >> rw,cpuacct,cpu
> >> > 27 23 0:22 / /sys/fs/cgroup/memory
> >> rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup
> >> rw,memory
> >> > 28 23 0:23 / /sys/fs/cgroup/devices
> >> rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup
> >> rw,devices
> >> > 29 23 0:24 / /sys/fs/cgroup/freezer
> >> rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup
> >> rw,freezer
> >> > 30 23 0:25 / /sys/fs/cgroup/net_cls
> >> rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup
> >> rw,net_cls
> >> > 31 23 0:26 / /sys/fs/cgroup/blkio
> >> rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup
> >> rw,blkio
> >> > 32 23 0:27 / /sys/fs/cgroup/perf_event
> >> rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup
> >> rw,perf_event
> >> > 34 1 253:1 / / rw,relatime shared:1 -
> >> ext4 /dev/mapper/fedora_dwarf52-root rw,seclabel,data=ordered
> >> > 35 15 0:29 / /proc/sys/fs/binfmt_misc rw,relatime shared:20
> >> - autofs systemd-1
> >> rw,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
> >> > 37 16 0:30 / /sys/kernel/config rw,relatime shared:21 -
> >> configfs configfs rw
> >> > 39 17 0:31 / /dev/hugepages rw,relatime shared:22 -
> >> hugetlbfs hugetlbfs rw,seclabel
> >> > 38 17 0:12 / /dev/mqueue rw,relatime shared:23 - mqueue
> >> mqueue rw,seclabel
> >> > 36 16 0:7 / /sys/kernel/debug rw,relatime shared:24 -
> >> debugfs debugfs rw
> >> > 40 34 0:32 / /tmp rw shared:25 - tmpfs tmpfs rw,seclabel
> >> > 41 34 8:1 / /boot rw,relatime shared:26 - ext4 /dev/sda1
> >> rw,seclabel,data=ordered
> >> > 42 34 253:2 / /home rw,relatime shared:27 -
> >> ext4 /dev/mapper/fedora_dwarf52-home rw,seclabel,data=ordered
> >> > 74 22 0:33 / /run/user/1000/gvfs rw,nosuid,nodev,relatime
> >> shared:57 - fuse.gvfsd-fuse gvfsd-fuse
> >> rw,user_id=1000,group_id=1000
> >> > 76 16 0:34 / /sys/fs/fuse/connections rw,relatime shared:59
> >> - fusectl fusectl rw
> >> >
> >> > Looks like everything has "shared".
> >> >
> >> > I'll be testing lxc on this beast with and without this
> >> patch over the
> >> > next couple of days for both systemd and non-systemd
> >> containers. I've
> >> > got to get 0.9.0a2 built on it first and then go from there.
> >> >
> >> > > > I didn't spend much time reviewing the code itself, but
> >> it applied to my
> >> > > > local staging tree and built fine, so that's good enough
> >> for me :)
> >> >
> >> > > Thanks - TBH the extra mounts are no more wrong than they
> >> are in
> >> > > a livecd, so I don't think it's a big problem. One we can
> >> address
> >> > > in January.
> >> >
> >> > > -serge
> >> >
> >> > Hope you (and everyone else) had a nice holiday!
> >> >
> >> > Regards,
> >> > Mike
> >>
> >> >
> >> ------------------------------------------------------------------------------
> >> > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API
> >> and
> >> > much more. Get web development skills now with LearnDevNow -
> >> > 350+ hours of step-by-step video tutorials by Microsoft MVPs
> >> and experts.
> >> > SALE $99.99 this month only -- learn more at:
> >> > http://p.sf.net/sfu/learnmore_122812
> >> > _______________________________________________ Lxc-devel
> >> mailing list Lxc-devel at lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/lxc-devel
> >>
> >>
> >> --
> >> Michael H. Warfield (AI4NB) | (770) 985-6132 |
> >> mhw at WittsEnd.com
> >> /\/\|=mhw=|\/\/ | (678) 463-0932 |
> >> http://www.wittsend.com/mhw/
> >> NIC whois: MHW9 | An optimist believes we live in
> >> the best of all
> >> PGP Key: 0x674627FF | possible worlds. A pessimist is
> >> sure of it!
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,
> >> HTML5, CSS,
> >> MVC, Windows 8 Apps, JavaScript and much more. Keep your
> >> skills current
> >> with LearnDevNow - 3,200 step-by-step video tutorials by
> >> Microsoft
> >> MVPs and experts. SALE $99.99 this month only -- learn more
> >> at:
> >> http://p.sf.net/sfu/learnmore_122912
> >> _______________________________________________
> >> Lxc-devel mailing list
> >> Lxc-devel at lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/lxc-devel
> >>
> >>
> >>
> >> --
> >> This message has been scanned for viruses and
> >> dangerous content by MailScanner, and is
> >> believed to be clean.
> >
> > --
> > Michael H. Warfield (AI4NB) | (770) 985-6132 | mhw at WittsEnd.com
> > /\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
> > NIC whois: MHW9 | An optimist believes we live in the best of all
> > PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
>
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | mhw at WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130105/02014abc/attachment.pgp>
More information about the lxc-devel
mailing list