[lxc-devel] [PATCH] Support MS_SHARED /

Alexander Vladimirov alexander.idkfa.vladimirov at gmail.com
Sun Jan 6 06:22:14 UTC 2013


I also noticed device nodes having strange permissions when /dev is
being auto-populated

[idkfa at lxc0 ~]$ ls -la /dev/{null,tty,urandom,zero,full}
crwxr-xr-x 1 root root 1, 7 Jan  6 05:56 /dev/full
crwxr-xr-x 1 root root 1, 3 Jan  6 05:56 /dev/null
crwxr-xr-x 1 root root 5, 0 Jan  6 05:56 /dev/tty
crwxr-xr-x 1 root root 1, 9 Jan  6 05:56 /dev/urandom
crwxr-xr-x 1 root root 1, 5 Jan  6 05:56 /dev/zero

not really sure what could cause this

2013/1/6 Michael H. Warfield <mhw at wittsend.com>:
> On Sun, 2013-01-06 at 06:39 +0800, Alexander Vladimirov wrote:
>> It is a separate package in Arch Linux and I dont have it installed on
>> the host, as well as in container since everything works well without
>> it
>
> Well, that would explain it.  What isn't explained is why we need it.
>
> This is the run_makedev() function which is called from setup_autodev()
> in src/lxc/setup.c just before it tries to populate the .../dev
> directory in the container.  There's some comments in there about making
> sure the /dev/vcs* entries are created.
>
> It's also not clear to me if it's even doing what it perports to do.  It
> changes to the dev directory and then runs /sbin/MAKEDEV (without
> checking if it even exists) without a parameter (-d) for the target
> directory which would seem to me to cause MAKEDEV to attempt to create
> the devices in the host /dev and not the container .../dev directory at
> all.  That actually appears consistent with the behavior I'm seeing.  If
> I reboot the host system, all those tty devices do not exist in the host
> until after I fire up a container with autodev enabled.  Then they
> appear in the host /dev which is not the correct behavior.
>
> I don't think we should be doing this but this is part of the earlier
> autodev patches Serge did for systemd that went into 9.0.0.a1.  Maybe
> it's a difference in behavior between MAKEDEV on Ubuntu vs MAKEDEV on
> Fedora (et al) and not even guaranteed to exist.
>
> Serge?
>
> Regards,
> Mike
>
>> 2013/1/6 Michael H. Warfield <mhw at wittsend.com>:
>> > On Sun, 2013-01-06 at 06:31 +0800, Alexander Vladimirov wrote:
>> >> I can confirm it works for Arch Linux with systemd 196
>> >> However I see exactly one message saying:
>> >> sh: /sbin/MAKEDEV: No such file or directory
>> >
>> > Do you have /sbin/MAKEDEV in the host system?  If not, that would make
>> > sense.  I'm not sure what it's suppose to be doing in lxc.
>> >
>> > Regards,
>> > Mike
>> >
>> >> 2013/1/6 Michael H. Warfield <mhw at wittsend.com>
>> >>         Hey Serge!
>> >>
>> >>         Took longer for me to test this out on Fedora 18 Beta than I
>> >>         had
>> >>         expected.  I got tangled up trying to get bridge networking
>> >>         working and
>> >>         my day job wanted to get in my way...  :-P  I hear down that
>> >>         F18 final
>> >>         has been delayed again but anticipated for Jan 15.  I'll test
>> >>         that when
>> >>         it becomes available.
>> >>
>> >>         IAC...  I was able to confirm that the 0.9.0.a2 cut very
>> >>         definitely
>> >>         fails on an F18Beta host with the expected pivot root error
>> >>         and that the
>> >>         code in staging does seem to work and seems to do the right
>> >>         thing.  This
>> >>         was starting an F17 container on an F18Beta host with autodev
>> >>         enabled
>> >>         and systemd 195 running in the container.
>> >>
>> >>         I did notice a huge "pile" of MAKEDEV errors creating tty
>> >>         devices when I
>> >>         ran lxc-start like these:
>> >>
>> >>         --
>> >>         MAKEDEV: /dev/ttyEQ1001: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1002: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1003: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1004: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1005: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1006: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1007: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1008: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1009: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1010: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1011: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1012: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1013: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1014: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1015: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1016: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1017: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1018: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1019: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1020: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1021: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1022: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1023: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1024: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1025: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1026: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyEQ1027: unable to set file creation context "
>> >>         system_u:object_r:tty_device_t:s0"
>> >>         MAKEDEV: /dev/ttyUB0: unable to set file creation context "
>> >>         system_u:object_r:device_t:s0"
>> >>         MAKEDEV: /dev/ttyUB1: unable to set file creation context "
>> >>         system_u:object_r:device_t:s0"
>> >>         <30>systemd[1]: systemd 195 running in system mode. (+PAM
>> >>         +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT
>> >>         +ACL +XZ; fedora)
>> >>         <30>systemd[1]: Detected virtualization 'lxc'.
>> >>
>> >>         Welcome to Fedora 17 (Beefy Miracle)!
>> >>
>> >>         <30>systemd[1]: Set hostname to <alcove.wittsend.com>.
>> >>         <28>systemd[1]: Cannot add dependency job for unit
>> >>         display-manager.service, ignoring: Unit
>> >>         display-manager.service failed to load: No such file or
>> >>         directory. See system logs and 'systemctl status
>> >>         display-manager.service' for details.
>> >>         <30>systemd[1]: Started Collect Read-Ahead Data.
>> >>         <30>systemd[1]: Started Replay Read-Ahead Data.
>> >>         <30>systemd[1]: Starting Forward Password Requests to Wall
>> >>         Directory Watch.
>> >>         <30>systemd[1]: Started Forward Password Requests to Wall
>> >>         Directory Watch.
>> >>         <30>systemd[1]: Starting Syslog Socket.
>> >>         [  OK  ] Listening on Syslog Socket.
>> >>         --
>> >>
>> >>         It certainly appears to have done the right thing and that
>> >>         same
>> >>         container on an F17 host does not emit those MAKEDEV errors
>> >>         and does not
>> >>         contain those tty devices.  Looks like an selinux issue inside
>> >>         the
>> >>         container.  But it's happening even when I set selinux to
>> >>         "permissive"
>> >>         mode in both the host and container.  Seems cosmetic,
>> >>         however.  Nothing
>> >>         showing up in the syslog messages file on either the host or
>> >>         the
>> >>         container.
>> >>
>> >>         I see a call to "/sbin/MAKEDEV console" in src/lxc/conf.c.
>> >>          Not sure if
>> >>         it's that call that's generating the problem but there is no
>> >>         MAKEDEV in
>> >>         the container.  It's interesting that they're showing up
>> >>         before systemd
>> >>         in the container is announcing its presence.  Looks like it's
>> >>         running
>> >>         the MAKEDEV command in the host environment and, if I run
>> >>         "MAKEDEV
>> >>         console" in the host itself, I get a couple thousand of those
>> >>         tty
>> >>         devices created in the host /dev, that were not present
>> >>         before, and I
>> >>         don't get any of the context errors...  Might be worth looking
>> >>         into just
>> >>         to see what all the noise is all about.
>> >>
>> >>         IAC...  Looks like it works on F18Beta.  I'm good.
>> >>
>> >>         Regards,
>> >>         Mike
>> >>
>> >>         On Thu, 2012-12-27 at 22:45 -0500, Michael H. Warfield wrote:
>> >>         > On Thu, 2012-12-20 at 09:03 -0600, Serge Hallyn wrote:
>> >>         > > Quoting Stéphane Graber (stgraber at ubuntu.com):
>> >>         > > > On 12/20/2012 06:58 AM, Serge Hallyn wrote:
>> >>         > > ...
>> >>         > > > /proc/mounts in the container will also end up being
>> >>         polluted by all the
>> >>         > > > mount points from the host, this in itself doesn't cause
>> >>         any big
>> >>         > > > problem, though the container will try (and fail) to
>> >>         unmount all of those.
>> >>         > > > Is there anything we can do to improve that situation or
>> >>         is that a side
>> >>         > > > effect of MS_SHARED that we can't workaround on our end?
>> >>
>> >>         > > I think it's actually a side effect of pivot-root after
>> >>         chroot.  You
>> >>         > > have /orig_root/foo/chroot_root/path/new_pivot/put_old.
>> >>          Then you
>> >>         > > chroot to /orig_root/foo/chroot_root.  When you then pivot
>> >>         to
>> >>         > > /path/new_pivot, what ends up in put_old
>> >>         is /orig_root/foo/chroot_root.
>> >>         > > I'm actually not sure you can trim the mounts which were
>> >>         under
>> >>         > > /orig_root.  We could figure out ones they are by
>> >>         following the chain
>> >>         > > of mount ids in /proc/self/mountinfo, but we can't reach
>> >>         them to umount
>> >>         > > them.
>> >>
>> >>         > > It's much like how when you boot a livecd, you see things
>> >>         like
>> >>         > > the rootfs on / as well as /cow on /.  You can't reach the
>> >>         rootfs
>> >>         > > which is parent of the /cow on / any more, but it's in the
>> >>         mounts
>> >>         > > table.
>> >>         >
>> >>         > > Now I tested, and with a simple setup we can use a much
>> >>         simpler
>> >>         > > patch which just does mount("", "/", NULL, MS_SLAVE|
>> >>         MS_REC, 0);
>> >>         > > for the whole of chroot_into_slave() (and skips the new
>> >>         umount2()
>> >>         > > in start.c).  The container then starts, and its mounts
>> >>         table
>> >>         > > is clean.
>> >>         >
>> >>         > > Where that won't work is in a livecd or any fancy raid
>> >>         setup,
>> >>         > > where your process's / has a parent which is MS_SHARED.
>> >>         >
>> >>         > > Michael, can you show me your /proc/self/mountinfo in a
>> >>         f18
>> >>         > > box?
>> >>         >
>> >>         > Freshly installed clean box...
>> >>         >
>> >>         > [root at dwarf52 mhw]# cat /proc/self/mountinfo
>> >>         > 15 34 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 -
>> >>         proc proc rw
>> >>         > 16 34 0:14 / /sys rw,nosuid,nodev,noexec,relatime shared:6 -
>> >>         sysfs sysfs rw,seclabel
>> >>         > 17 34 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs
>> >>         rw,seclabel,size=491520k,nr_inodes=122880,mode=755
>> >>         > 18 16 0:15 / /sys/kernel/security
>> >>         rw,nosuid,nodev,noexec,relatime shared:7 - securityfs
>> >>         securityfs rw
>> >>         > 19 16 0:13 / /sys/fs/selinux rw,relatime shared:8 -
>> >>         selinuxfs selinuxfs rw
>> >>         > 20 17 0:16 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs
>> >>         rw,seclabel
>> >>         > 21 17 0:10 / /dev/pts rw,nosuid,noexec,relatime shared:4 -
>> >>         devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
>> >>         > 22 34 0:17 / /run rw,nosuid,nodev shared:19 - tmpfs tmpfs
>> >>         rw,seclabel,mode=755
>> >>         > 23 16 0:18 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:9
>> >>         - tmpfs tmpfs rw,seclabel,mode=755
>> >>         > 24 23 0:19 / /sys/fs/cgroup/systemd
>> >>         rw,nosuid,nodev,noexec,relatime shared:10 - cgroup cgroup
>> >>         rw,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
>> >>         > 25 23 0:20 / /sys/fs/cgroup/cpuset
>> >>         rw,nosuid,nodev,noexec,relatime shared:11 - cgroup cgroup
>> >>         rw,cpuset
>> >>         > 26 23 0:21 / /sys/fs/cgroup/cpu,cpuacct
>> >>         rw,nosuid,nodev,noexec,relatime shared:12 - cgroup cgroup
>> >>         rw,cpuacct,cpu
>> >>         > 27 23 0:22 / /sys/fs/cgroup/memory
>> >>         rw,nosuid,nodev,noexec,relatime shared:13 - cgroup cgroup
>> >>         rw,memory
>> >>         > 28 23 0:23 / /sys/fs/cgroup/devices
>> >>         rw,nosuid,nodev,noexec,relatime shared:14 - cgroup cgroup
>> >>         rw,devices
>> >>         > 29 23 0:24 / /sys/fs/cgroup/freezer
>> >>         rw,nosuid,nodev,noexec,relatime shared:15 - cgroup cgroup
>> >>         rw,freezer
>> >>         > 30 23 0:25 / /sys/fs/cgroup/net_cls
>> >>         rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup
>> >>         rw,net_cls
>> >>         > 31 23 0:26 / /sys/fs/cgroup/blkio
>> >>         rw,nosuid,nodev,noexec,relatime shared:17 - cgroup cgroup
>> >>         rw,blkio
>> >>         > 32 23 0:27 / /sys/fs/cgroup/perf_event
>> >>         rw,nosuid,nodev,noexec,relatime shared:18 - cgroup cgroup
>> >>         rw,perf_event
>> >>         > 34 1 253:1 / / rw,relatime shared:1 -
>> >>         ext4 /dev/mapper/fedora_dwarf52-root rw,seclabel,data=ordered
>> >>         > 35 15 0:29 / /proc/sys/fs/binfmt_misc rw,relatime shared:20
>> >>         - autofs systemd-1
>> >>         rw,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct
>> >>         > 37 16 0:30 / /sys/kernel/config rw,relatime shared:21 -
>> >>         configfs configfs rw
>> >>         > 39 17 0:31 / /dev/hugepages rw,relatime shared:22 -
>> >>         hugetlbfs hugetlbfs rw,seclabel
>> >>         > 38 17 0:12 / /dev/mqueue rw,relatime shared:23 - mqueue
>> >>         mqueue rw,seclabel
>> >>         > 36 16 0:7 / /sys/kernel/debug rw,relatime shared:24 -
>> >>         debugfs debugfs rw
>> >>         > 40 34 0:32 / /tmp rw shared:25 - tmpfs tmpfs rw,seclabel
>> >>         > 41 34 8:1 / /boot rw,relatime shared:26 - ext4 /dev/sda1
>> >>         rw,seclabel,data=ordered
>> >>         > 42 34 253:2 / /home rw,relatime shared:27 -
>> >>         ext4 /dev/mapper/fedora_dwarf52-home rw,seclabel,data=ordered
>> >>         > 74 22 0:33 / /run/user/1000/gvfs rw,nosuid,nodev,relatime
>> >>         shared:57 - fuse.gvfsd-fuse gvfsd-fuse
>> >>         rw,user_id=1000,group_id=1000
>> >>         > 76 16 0:34 / /sys/fs/fuse/connections rw,relatime shared:59
>> >>         - fusectl fusectl rw
>> >>         >
>> >>         > Looks like everything has "shared".
>> >>         >
>> >>         > I'll be testing lxc on this beast with and without this
>> >>         patch over the
>> >>         > next couple of days for both systemd and non-systemd
>> >>         containers.  I've
>> >>         > got to get 0.9.0a2 built on it first and then go from there.
>> >>         >
>> >>         > > > I didn't spend much time reviewing the code itself, but
>> >>         it applied to my
>> >>         > > > local staging tree and built fine, so that's good enough
>> >>         for me :)
>> >>         >
>> >>         > > Thanks -  TBH the extra mounts are no more wrong than they
>> >>         are in
>> >>         > > a livecd, so I don't think it's a big problem.  One we can
>> >>         address
>> >>         > > in January.
>> >>         >
>> >>         > > -serge
>> >>         >
>> >>         > Hope you (and everyone else) had a nice holiday!
>> >>         >
>> >>         > Regards,
>> >>         > Mike
>> >>
>> >>         >
>> >>         ------------------------------------------------------------------------------
>> >>         > Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API
>> >>         and
>> >>         > much more. Get web development skills now with LearnDevNow -
>> >>         > 350+ hours of step-by-step video tutorials by Microsoft MVPs
>> >>         and experts.
>> >>         > SALE $99.99 this month only -- learn more at:
>> >>         > http://p.sf.net/sfu/learnmore_122812
>> >>         > _______________________________________________ Lxc-devel
>> >>         mailing list Lxc-devel at lists.sourceforge.net
>> >>         https://lists.sourceforge.net/lists/listinfo/lxc-devel
>> >>
>> >>
>> >>         --
>> >>         Michael H. Warfield (AI4NB) | (770) 985-6132 |
>> >>          mhw at WittsEnd.com
>> >>            /\/\|=mhw=|\/\/          | (678) 463-0932 |
>> >>          http://www.wittsend.com/mhw/
>> >>            NIC whois: MHW9          | An optimist believes we live in
>> >>         the best of all
>> >>          PGP Key: 0x674627FF        | possible worlds.  A pessimist is
>> >>         sure of it!
>> >>
>> >>
>> >>         ------------------------------------------------------------------------------
>> >>         Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012,
>> >>         HTML5, CSS,
>> >>         MVC, Windows 8 Apps, JavaScript and much more. Keep your
>> >>         skills current
>> >>         with LearnDevNow - 3,200 step-by-step video tutorials by
>> >>         Microsoft
>> >>         MVPs and experts. SALE $99.99 this month only -- learn more
>> >>         at:
>> >>         http://p.sf.net/sfu/learnmore_122912
>> >>         _______________________________________________
>> >>         Lxc-devel mailing list
>> >>         Lxc-devel at lists.sourceforge.net
>> >>         https://lists.sourceforge.net/lists/listinfo/lxc-devel
>> >>
>> >>
>> >>
>> >> --
>> >> This message has been scanned for viruses and
>> >> dangerous content by MailScanner, and is
>> >> believed to be clean.
>> >
>> > --
>> > Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
>> >    /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
>> >    NIC whois: MHW9          | An optimist believes we live in the best of all
>> >  PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
>>
>
> --
> Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
>    /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
>    NIC whois: MHW9          | An optimist believes we live in the best of all
>  PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!




More information about the lxc-devel mailing list