[lxc-devel] [PATCH] Support MS_SHARED /

Michael H. Warfield mhw at WittsEnd.com
Sun Jan 6 03:15:43 UTC 2013


On Sun, 2013-01-06 at 06:39 +0800, Alexander Vladimirov wrote:
> It is a separate package in Arch Linux and I dont have it installed on
> the host, as well as in container since everything works well without
> it

Well, that would explain it.  What isn't explained is why we need it.

This is the run_makedev() function which is called from setup_autodev()
in src/lxc/setup.c just before it tries to populate the .../dev
directory in the container.  There's some comments in there about making
sure the /dev/vcs* entries are created.

It's also not clear to me if it's even doing what it perports to do.  It
changes to the dev directory and then runs /sbin/MAKEDEV (without
checking if it even exists) without a parameter (-d) for the target
directory which would seem to me to cause MAKEDEV to attempt to create
the devices in the host /dev and not the container .../dev directory at
all.  That actually appears consistent with the behavior I'm seeing.  If
I reboot the host system, all those tty devices do not exist in the host
until after I fire up a container with autodev enabled.  Then they
appear in the host /dev which is not the correct behavior.

I don't think we should be doing this but this is part of the earlier
autodev patches Serge did for systemd that went into 9.0.0.a1.  Maybe
it's a difference in behavior between MAKEDEV on Ubuntu vs MAKEDEV on
Fedora (et al) and not even guaranteed to exist.

Serge?

Regards,
Mike

> 2013/1/6 Michael H. Warfield <mhw at wittsend.com>:
> > On Sun, 2013-01-06 at 06:31 +0800, Alexander Vladimirov wrote:
> >> I can confirm it works for Arch Linux with systemd 196
> >> However I see exactly one message saying:
> >> sh: /sbin/MAKEDEV: No such file or directory
> >
> > Do you have /sbin/MAKEDEV in the host system?  If not, that would make
> > sense.  I'm not sure what it's suppose to be doing in lxc.
> >
> > Regards,
> > Mike
> >
> >> 2013/1/6 Michael H. Warfield <mhw at wittsend.com>
> >>         Hey Serge!
> >>
> >>         Took longer for me to test this out on Fedora 18 Beta than I
> >>         had
> >>         expected.  I got tangled up trying to get bridge networking
> >>         working and
> >>         my day job wanted to get in my way...  :-P  I hear down that
> >>         F18 final
> >>         has been delayed again but anticipated for Jan 15.  I'll test
> >>         that when
> >>         it becomes available.
> >>
> >>         IAC...  I was able to confirm that the 0.9.0.a2 cut very
> >>         definitely
> >>         fails on an F18Beta host with the expected pivot root error
> >>         and that the
> >>         code in staging does seem to work and seems to do the right
> >>         thing.  This
> >>         was starting an F17 container on an F18Beta host with autodev
> >>         enabled
> >>         and systemd 195 running in the container.
> >>
> >>         I did notice a huge "pile" of MAKEDEV errors creating tty
> >>         devices when I
> >>         ran lxc-start like these:
> >>
> >>         --
> >>         MAKEDEV: /dev/ttyEQ1001: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1002: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1003: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1004: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1005: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1006: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1007: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1008: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1009: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1010: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1011: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1012: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1013: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1014: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1015: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1016: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1017: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1018: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1019: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1020: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1021: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1022: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1023: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1024: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1025: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1026: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyEQ1027: unable to set file creation context "
> >>         system_u:object_r:tty_device_t:s0"
> >>         MAKEDEV: /dev/ttyUB0: unable to set file creation context "
> >>         system_u:object_r:device_t:s0"
> >>         MAKEDEV: /dev/ttyUB1: unable to set file creation context "
> >>         system_u:object_r:device_t:s0"
> >>         <30>systemd[1]: systemd 195 running in system mode. (+PAM
> >>         +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT
> >>         +ACL +XZ; fedora)
> >>         <30>systemd[1]: Detected virtualization 'lxc'.
> >>
> >>         Welcome to Fedora 17 (Beefy Miracle)!
> >>
> >>         <30>systemd[1]: Set hostname to <alcove.wittsend.com>.
> >>         <28>systemd[1]: Cannot add dependency job for unit
> >>         display-manager.service, ignoring: Unit
> >>         display-manager.service failed to load: No such file or
> >>         directory. See system logs and 'systemctl status
> >>         display-manager.service' for details.
> >>         <30>systemd[1]: Started Collect Read-Ahead Data.
> >>         <30>systemd[1]: Started Replay Read-Ahead Data.
> >>         <30>systemd[1]: Starting Forward Password Requests to Wall
> >>         Directory Watch.
> >>         <30>systemd[1]: Started Forward Password Requests to Wall
> >>         Directory Watch.
> >>         <30>systemd[1]: Starting Syslog Socket.
> >>         [  OK  ] Listening on Syslog Socket.
> >>         --
> >>
> >>         It certainly appears to have done the right thing and that
> >>         same
> >>         container on an F17 host does not emit those MAKEDEV errors
> >>         and does not
> >>         contain those tty devices.  Looks like an selinux issue inside
> >>         the
> >>         container.  But it's happening even when I set selinux to
> >>         "permissive"
> >>         mode in both the host and container.  Seems cosmetic,
> >>         however.  Nothing
> >>         showing up in the syslog messages file on either the host or
> >>         the
> >>         container.
> >>
> >>         I see a call to "/sbin/MAKEDEV console" in src/lxc/conf.c.
> >>          Not sure if
> >>         it's that call that's generating the problem but there is no
> >>         MAKEDEV in
> >>         the container.  It's interesting that they're showing up
> >>         before systemd
> >>         in the container is announcing its presence.  Looks like it's
> >>         running
> >>         the MAKEDEV command in the host environment and, if I run
> >>         "MAKEDEV
> >>         console" in the host itself, I get a couple thousand of those
> >>         tty
> >>         devices created in the host /dev, that were not present
> >>         before, and I
> >>         don't get any of the context errors...  Might be worth looking
> >>         into just
> >>         to see what all the noise is all about.
> >>
> >>         IAC...  Looks like it works on F18Beta.  I'm good.
> >>
> >>         Regards,
> >>         Mike
> >>
> >>         On Thu, 2012-12-27 at 22:45 -0500, Michael H. Warfield wrote:
> >>         > On Thu, 2012-12-20 at 09:03 -0600, Serge Hallyn wrote:
> >>         > > Quoting Stéphane Graber (stgraber at ubuntu.com):
> >>         > > > On 12/20/2012 06:58 AM, Serge Hallyn wrote:
> >>         > > ...
> >>         > > > /proc/mounts in the container will also end up being
> >>         polluted by all the
> >>         > > > mount points from the host, this in itself doesn't cause
> >>         any big
> >>         > > > problem, though the container will try (and fail) to
> >>         unmount all of those.
> >>         > > > Is there anything we can do to improve that situation or
> >>         is that a side
> >>         > > > effect of MS_SHARED that we can't workaround on our end?
> >>
> >>         > > I think it's actually a side effect of pivot-root after
> >>         chroot.  You
> >>         > > have /orig_root/foo/chroot_root/path/new_pivot/put_old.
> >>          Then you
> >>         > > chroot to /orig_root/foo/chroot_root.  When you then pivot
> >>         to
> >>         > > /path/new_pivot, what ends up in put_old
> >>         is /orig_root/foo/chroot_root.
> >>         > > I'm actually not sure you can trim the mounts which were
> >>         under
> >>         > > /orig_root.  We could figure out ones they are by
> >>         following the chain
> >>         > > of mount ids in /proc/self/mountinfo, but we can't reach
> >>         them to umount
> >>         > > them.
> >>
> >>         > > It's much like how when you boot a livecd, you see things
> >>         like
> >>         > > the rootfs on / as well as /cow on /.  You can't reach the
> >>         rootfs
> >>         > > which is parent of the /cow on / any more, but it's in the
> >>         mounts
> >>         > > table.
> >>         >
> >>         > > Now I tested, and with a simple setup we can use a much
> >>         simpler
> >>         > > patch which just does mount("", "/", NULL, MS_SLAVE|
> >>         MS_REC, 0);
> >>         > > for the whole of chroot_into_slave() (and skips the new
> >>         umount2()
> >>         > > in start.c).  The container then starts, and its mounts
> >>         table
> >>         > > is clean.
> >>         >
> >>         > > Where that won't work is in a livecd or any fancy raid
> >>         setup,
> >>         > > where your process's / has a parent which is MS_SHARED.
> >>         >
> >>         > > Michael, can you show me your /proc/self/mountinfo in a
> >>         f18
> >>         > > box?
> >>         >
> >>         > Freshly installed clean box...
> >>         >
> >>         > [root at dwarf52 mhw]# cat /proc/self/mountinfo
> >>         > 15 34 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 -
> >>         proc proc rw
> >>         > 16 34 0:14 / /sys rw,nosuid,nodev,noexec,relatime shared:6 -
> >>         sysfs sysfs rw,seclabel
> >>         > 17 34 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs
> >>         rw,seclabel,size=491520k,nr_inodes=122880,mode=755
> >>         > 18 16 0:15 / /sys/kernel/security
> >>         rw,nosuid,nodev,noexec,relatime shared:7 - securityfs
> >>         securityfs rw
> >>         > 19 16 0:13 / /sys/fs/selinux rw,relatime shared:8 -
> >>         selinuxfs selinuxfs rw
> >>         > 20 17 0:16 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs
> >>         rw,seclabel
> >>         > 21 17 0:10 / /dev/pts rw,nosuid,noexec,relatime shared:4 -
> >>         devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> >>         > 22 34 0:17 / /run rw,nosuid,nodev shared:19 - tmpfs tmpfs
> >>         rw,seclabel,mode=755
> >>         > 23 16 0:18 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:9
> >>         - tmpfs tmpfs rw,seclabel,mode=755
> >>         > 24 23 0:19 / /sys/fs/cgroup/systemd
> >>         rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup
> >>         rw,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> >>         > 25 23 0:20 / /sys/fs/cgroup/cpuset
> >>         rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup
> >>         rw,cpuset
> >>         > 26 23 0:21 / /sys/fs/cgroup/cpu,cpuacct
> >>         rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup
> >>         rw,cpuacct,cpu
> >>         > 27 23 0:22 / /sys/fs/cgroup/memory
> >>         rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup
> >>         rw,memory
> >>         > 28 23 0:23 / /sys/fs/cgroup/devices
> >>         rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup
> >>         rw,devices
> >>         > 29 23 0:24 / /sys/fs/cgroup/freezer
> >>         rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup
> >>         rw,freezer
> >>         > 30 23 0:25 / /sys/fs/cgroup/net_cls
> >>         rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup
> >>         rw,net_cls
> >>         > 31 23 0:26 / /sys/fs/cgroup/blkio
> >>         rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup
> >>         rw,blkio
> >>         > 32 23 0:27 / /sys/fs/cgroup/perf_event
> >>         rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup
> >>         rw,perf_event
> >>         > 34 1 253:1 / / rw,relatime shared:1 -
> >>         ext4 /dev/mapper/fedora_dwarf52-root rw,seclabel,data=ordered
> >>         > 35 15 0:29 / /proc/sys/fs/binfmt_misc rw,relatime shared:20
> >>         - autofs systemd-1
> >>         rw,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
> >>         > 37 16 0:30 / /sys/kernel/config rw,relatime shared:21 -
> >>         configfs configfs rw
> >>         > 39 17 0:31 / /dev/hugepages rw,relatime shared:22 -
> >>         hugetlbfs hugetlbfs rw,seclabel
> >>         > 38 17 0:12 / /dev/mqueue rw,relatime shared:23 - mqueue
> >>         mqueue rw,seclabel
> >>         > 36 16 0:7 / /sys/kernel/debug rw,relatime shared:24 -
> >>         debugfs debugfs rw
> >>         > 40 34 0:32 / /tmp rw shared:25 - tmpfs tmpfs rw,seclabel
> >>         > 41 34 8:1 / /boot rw,relatime shared:26 - ext4 /dev/sda1
> >>         rw,seclabel,data=ordered
> >>         > 42 34 253:2 / /home rw,relatime shared:27 -
> >>         ext4 /dev/mapper/fedora_dwarf52-home rw,seclabel,data=ordered
> >>         > 74 22 0:33 / /run/user/1000/gvfs rw,nosuid,nodev,relatime
> >>         shared:57 - fuse.gvfsd-fuse gvfsd-fuse
> >>         rw,user_id=1000,group_id=1000
> >>         > 76 16 0:34 / /sys/fs/fuse/connections rw,relatime shared:59
> >>         - fusectl fusectl rw
> >>         >
> >>         > Looks like everything has "shared".
> >>         >
> >>         > I'll be testing lxc on this beast with and without this
> >>         patch over the
> >>         > next couple of days for both systemd and non-systemd
> >>         containers.  I've
> >>         > got to get 0.9.0a2 built on it first and then go from there.
> >>         >
> >>         > > > I didn't spend much time reviewing the code itself, but
> >>         it applied to my
> >>         > > > local staging tree and built fine, so that's good enough
> >>         for me :)
> >>         >
> >>         > > Thanks -  TBH the extra mounts are no more wrong than they
> >>         are in
> >>         > > a livecd, so I don't think it's a big problem.  One we can
> >>         address
> >>         > > in January.
> >>         >
> >>         > > -serge
> >>         >
> >>         > Hope you (and everyone else) had a nice holiday!
> >>         >
> >>         > Regards,
> >>         > Mike
> >>
> >>         >
> >>         ------------------------------------------------------------------------------
> >>         > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API
> >>         and
> >>         > much more. Get web development skills now with LearnDevNow -
> >>         > 350+ hours of step-by-step video tutorials by Microsoft MVPs
> >>         and experts.
> >>         > SALE $99.99 this month only -- learn more at:
> >>         > http://p.sf.net/sfu/learnmore_122812
> >>         > _______________________________________________ Lxc-devel
> >>         mailing list Lxc-devel at lists.sourceforge.net
> >>         https://lists.sourceforge.net/lists/listinfo/lxc-devel
> >>
> >>
> >>         --
> >>         Michael H. Warfield (AI4NB) | (770) 985-6132 |
> >>          mhw at WittsEnd.com
> >>            /\/\|=mhw=|\/\/          | (678) 463-0932 |
> >>          http://www.wittsend.com/mhw/
> >>            NIC whois: MHW9          | An optimist believes we live in
> >>         the best of all
> >>          PGP Key: 0x674627FF        | possible worlds.  A pessimist is
> >>         sure of it!
> >>
> >>
> >>         ------------------------------------------------------------------------------
> >>         Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,
> >>         HTML5, CSS,
> >>         MVC, Windows 8 Apps, JavaScript and much more. Keep your
> >>         skills current
> >>         with LearnDevNow - 3,200 step-by-step video tutorials by
> >>         Microsoft
> >>         MVPs and experts. SALE $99.99 this month only -- learn more
> >>         at:
> >>         http://p.sf.net/sfu/learnmore_122912
> >>         _______________________________________________
> >>         Lxc-devel mailing list
> >>         Lxc-devel at lists.sourceforge.net
> >>         https://lists.sourceforge.net/lists/listinfo/lxc-devel
> >>
> >>
> >>
> >> --
> >> This message has been scanned for viruses and
> >> dangerous content by MailScanner, and is
> >> believed to be clean.
> >
> > --
> > Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
> >    /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
> >    NIC whois: MHW9          | An optimist believes we live in the best of all
> >  PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
> 

-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130105/02014abc/attachment.pgp>


More information about the lxc-devel mailing list