[lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

Fajar A. Nugraha list at fajar.net
Thu May 29 05:03:34 UTC 2014


On Thu, May 29, 2014 at 10:58 AM, Serge Hallyn <serge.hallyn at ubuntu.com>wrote:

> Quoting Fajar A. Nugraha (list at fajar.net):
> > On Thu, May 29, 2014 at 5:08 AM, Serge Hallyn <serge.hallyn at ubuntu.com
> >wrote:
> > > would systemd be happy with it being mounted by lxc using an
> > > lxc.mount.entry?  I think that would be preferable to relaxing the
> > > apparmor policy.  i.e.
> > >
> > > lxc.mount.entry = /sys/fs/cgroup/systemd sys/fs/cgroup/systemd none
> > > bind,create=dir,optional 0 0
> > >
> > >
> > Wouldn't that be shadowed by the container mounting its own /sys?
>
> If lxc mounts /sys then systemd will leave it be.
>
>
Apparently that line alone doesn't work for me. I also had to add before
that:

lxc.mount.entry = sysfs sys sysfs default 0 0
lxc.mount.entry = none sys/fs/cgroup tmpfs rw 0 0



> > Stephane also pointed out in my (closed) pull request that it would also
> > allow the container to mess with the hosts's resource allocation.
>
> Yes, that's why lxc.mount.auto = cgroup:mixed is better.  But the above
> mount entry is no worse than letting the container do it through
> apparmor.
>

That does not work, apparently.

### in confing
lxc.mount.auto = cgroup:mixed
###

### lxc-start output
<30>systemd[1]: Starting Root Slice.
<27>systemd[1]: Caught <SEGV>, dumped core as pid 12.
<30>systemd[1]: Freezing execution.
###

###
# lxc-attach -n f20 -- mount
rpool/lxc on / type zfs (rw,noatime,xattr,noacl)
udev on /dev type devtmpfs
(rw,relatime,size=2473540k,nr_inodes=618385,mode=755)
cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k,mode=755)
none on /sys/fs/cgroup/cgmanager type tmpfs (rw,relatime,size=4k,mode=755)
devpts on /dev/lxc/console type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty1 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty2 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty3 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty4 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=666)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)

# lxc-attach -n f20 -- ls /sys/fs/cgroup/
blkio  cpu,cpuacct  cpuset  devices  freezer  hugetlb  memory  perf_event
 systemd

# lxc-attach -n f20 -- ls /sys/fs/cgroup/systemd
(no output)
###

It looks like there's two lines for /sys/fs/cgroup? I'm using trusty's
lxc-1.0.3.




>
> > This works (at least, tested with console and ssh login), and should be
> > secure-enough (bind-mount the container subdir, instead of the whole
> > systemd cgroup), but complicated.
> >
> > ### snippet of config
> > lxc.hook.mount = "/var/lib/lxc/f20/bin/create_container_systemd_cgroup"
> > lxc.hook.post-stop =
> "/var/lib/lxc/f20/bin/remove_container_systemd_cgroup"
> > ###
> >
> > ### cat create_container_systemd_cgroup
> > #!/bin/bash
> > mkdir -p /sys/fs/cgroup/systemd/lxc/$LXC_NAME
> > mount -t sysfs sysfs $LXC_ROOTFS_MOUNT/sys
> > mount -t tmpfs none $LXC_ROOTFS_MOUNT/sys/fs/cgroup
> > mkdir $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
> > mount --bind /sys/fs/cgroup/systemd/lxc/$LXC_NAME
> > $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
> > ###
> >
> > ### cat remove_container_systemd_cgroup
> > #!/bin/bash
> > [ -n "$LXC_NAME" ] && find /sys/fs/cgroup/systemd/lxc/$LXC_NAME -type d |
> > tac | xargs rmdir
> > ###
> >
> > Is there a way to simplify this somehow for it to be more suitable in the
> > template?
>
> I suppose we could add a new a lxc.mount.auto = cgroup:systemd option which
> only mounts name=systemd, read-only except for the container's own cgroup
> which is rw?  But when I say we I don't really mean we :)
>


Will that work?

systemd cgroup mount is weird in a sense that there's no
/lxc/CONTAINER_NAME subdirs under /sys/fs/cgroup/systemd, while there's one
under /sys/fs/crgoup/{blkio,cpu,etc}. So for systemd cgroup I don't see
which ones should be mount ro and which gets rw.

The workaround hook I wrote earlier creates the directory
/sys/fs/cgroup/systemd/lxc/CONTAINER_NAME on the host, and bind-mount it as
the container's /sys/fs/cgroup/systemd.

-- 
Fajar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140529/2fe823d0/attachment.html>


More information about the lxc-users mailing list