[lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)
Serge Hallyn
serge.hallyn at ubuntu.com
Thu May 29 15:46:16 UTC 2014
Quoting Fajar A. Nugraha (list at fajar.net):
> On Thu, May 29, 2014 at 10:58 AM, Serge Hallyn <serge.hallyn at ubuntu.com>wrote:
>
> > Quoting Fajar A. Nugraha (list at fajar.net):
> > > On Thu, May 29, 2014 at 5:08 AM, Serge Hallyn <serge.hallyn at ubuntu.com
> > >wrote:
> > > > would systemd be happy with it being mounted by lxc using an
> > > > lxc.mount.entry? I think that would be preferable to relaxing the
> > > > apparmor policy. i.e.
> > > >
> > > > lxc.mount.entry = /sys/fs/cgroup/systemd sys/fs/cgroup/systemd none
> > > > bind,create=dir,optional 0 0
> > > >
> > > >
> > > Wouldn't that be shadowed by the container mounting its own /sys?
> >
> > If lxc mounts /sys then systemd will leave it be.
> >
> >
> Apparently that line alone doesn't work for me. I also had to add before
> that:
>
> lxc.mount.entry = sysfs sys sysfs default 0 0
> lxc.mount.entry = none sys/fs/cgroup tmpfs rw 0 0
or lxc.mount.auto = sys
That's what I meant by 'if lxc mounts /sys' :)
> > > Stephane also pointed out in my (closed) pull request that it would also
> > > allow the container to mess with the hosts's resource allocation.
> >
> > Yes, that's why lxc.mount.auto = cgroup:mixed is better. But the above
> > mount entry is no worse than letting the container do it through
> > apparmor.
> >
>
> That does not work, apparently.
>
> ### in confing
> lxc.mount.auto = cgroup:mixed
> ###
>
> ### lxc-start output
> <30>systemd[1]: Starting Root Slice.
> <27>systemd[1]: Caught <SEGV>, dumped core as pid 12.
> <30>systemd[1]: Freezing execution.
> ###
Hm, that's unfortunate. I thought lxc.mount.auto = cgroup:mixed
with cgfs would mount named subsystems? Christian?
> ###
> # lxc-attach -n f20 -- mount
> rpool/lxc on / type zfs (rw,noatime,xattr,noacl)
> udev on /dev type devtmpfs
> (rw,relatime,size=2473540k,nr_inodes=618385,mode=755)
> cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k,mode=755)
> none on /sys/fs/cgroup/cgmanager type tmpfs (rw,relatime,size=4k,mode=755)
> devpts on /dev/lxc/console type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> devpts on /dev/lxc/tty1 type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> devpts on /dev/lxc/tty2 type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> devpts on /dev/lxc/tty3 type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> devpts on /dev/lxc/tty4 type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=666)
> proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
> tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
>
> # lxc-attach -n f20 -- ls /sys/fs/cgroup/
> blkio cpu,cpuacct cpuset devices freezer hugetlb memory perf_event
> systemd
>
> # lxc-attach -n f20 -- ls /sys/fs/cgroup/systemd
> (no output)
> ###
>
> It looks like there's two lines for /sys/fs/cgroup? I'm using trusty's
> lxc-1.0.3.
>
>
>
>
> >
> > > This works (at least, tested with console and ssh login), and should be
> > > secure-enough (bind-mount the container subdir, instead of the whole
> > > systemd cgroup), but complicated.
> > >
> > > ### snippet of config
> > > lxc.hook.mount = "/var/lib/lxc/f20/bin/create_container_systemd_cgroup"
> > > lxc.hook.post-stop =
> > "/var/lib/lxc/f20/bin/remove_container_systemd_cgroup"
> > > ###
> > >
> > > ### cat create_container_systemd_cgroup
> > > #!/bin/bash
> > > mkdir -p /sys/fs/cgroup/systemd/lxc/$LXC_NAME
> > > mount -t sysfs sysfs $LXC_ROOTFS_MOUNT/sys
> > > mount -t tmpfs none $LXC_ROOTFS_MOUNT/sys/fs/cgroup
> > > mkdir $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
> > > mount --bind /sys/fs/cgroup/systemd/lxc/$LXC_NAME
> > > $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
> > > ###
> > >
> > > ### cat remove_container_systemd_cgroup
> > > #!/bin/bash
> > > [ -n "$LXC_NAME" ] && find /sys/fs/cgroup/systemd/lxc/$LXC_NAME -type d |
> > > tac | xargs rmdir
> > > ###
> > >
> > > Is there a way to simplify this somehow for it to be more suitable in the
> > > template?
> >
> > I suppose we could add a new a lxc.mount.auto = cgroup:systemd option which
> > only mounts name=systemd, read-only except for the container's own cgroup
> > which is rw? But when I say we I don't really mean we :)
> >
>
>
> Will that work?
>
> systemd cgroup mount is weird in a sense that there's no
> /lxc/CONTAINER_NAME subdirs under /sys/fs/cgroup/systemd, while there's one
> under /sys/fs/crgoup/{blkio,cpu,etc}. So for systemd cgroup I don't see
> which ones should be mount ro and which gets rw.
>
> The workaround hook I wrote earlier creates the directory
> /sys/fs/cgroup/systemd/lxc/CONTAINER_NAME on the host, and bind-mount it as
> the container's /sys/fs/cgroup/systemd.
>
> --
> Fajar
More information about the lxc-users
mailing list