[lxc-users] unprivileged Debian Buster container on Debian Buster host fail to start: no cgroups, no controllers

Lukas Pirl lxc-users at lukas-pirl.de
Tue May 28 13:54:41 UTC 2019


Dear all,

first, thanks for the friendly and supportive help you all provide in issue
trackers, on mailing lists, etc. – it is very helpful to find all this online.

However, I struggle to run unprivileged (Debian Buster) containers (on a
Debian Buster host). LXC does not seem to mount the cgroup mount points for
the container, thus the container's systemd tries to mount those and fails due
to insufficient permissions.

The log reports no writable cgroup hierarchies and no available controllers –
could there be a common cause?

I decided not to open an issue so far, since I am not sure if it is just me
being incompetent here or if there is an actual issue. If we find an actual
issue, I'll of course move this to the issue tracker.

Please find all the configuration dumps and logs below.

IIRC, I tried to run the script as provided in
  https://github.com/lxc/lxc/issues/1998#issuecomment-353241255
without success and various other things. However, I am unsure how the
available information can be applied since a few things changed in LXC 3, no?
And systemd seems to a moving target as well.

Also, I work on an automation using Ansible to set up a host which can run
unprivileged containers. This will be publicly available once everything
works.

Cheers,

Lukas

========================================================================

symptom
-------

``lxc-start -n rproxy -l TRACE -o lxc.log -F``::

  Failed to mount cgroup at /sys/fs/cgroup/systemd: Permission denied
  [!!!!!!] Failed to mount API filesystems.
  Exiting PID 1...

``lxc.log``: 
https://bin.privacytools.io/?28c8377e545ce6a9#9I2a28JuaYf7yHDNIxtxCQxox6LvTrxT4l4scUDgQNc=

host details
------------

* ``cat /etc/debian_version``: 10.0
* ``lxc-start --version``: 3.0.3
* ``lxc-checkconfig``::

    Kernel configuration not found at /proc/config.gz; searching...
    Kernel configuration found at /boot/config-4.19.0-5-amd64
    --- Namespaces ---
    Namespaces: enabled
    Utsname namespace: enabled
    Ipc namespace: enabled
    Pid namespace: enabled
    User namespace: enabled
    Network namespace: enabled

    --- Control groups ---
    Cgroups: enabled

    Cgroup v1 mount points:
    /sys/fs/cgroup/systemd
    /sys/fs/cgroup/memory
    /sys/fs/cgroup/cpuset
    /sys/fs/cgroup/cpu,cpuacct
    /sys/fs/cgroup/blkio
    /sys/fs/cgroup/net_cls,net_prio
    /sys/fs/cgroup/perf_event
    /sys/fs/cgroup/rdma
    /sys/fs/cgroup/freezer
    /sys/fs/cgroup/devices
    /sys/fs/cgroup/pids

    Cgroup v2 mount points:
    /sys/fs/cgroup/unified

    Cgroup v1 clone_children flag: enabled
    Cgroup device: enabled
    Cgroup sched: enabled
    Cgroup cpu account: enabled
    Cgroup memory controller: enabled
    Cgroup cpuset: enabled

    --- Misc ---
    Veth pair device: enabled, not loaded
    Macvlan: enabled, not loaded
    Vlan: enabled, not loaded
    Bridges: enabled, loaded
    Advanced netfilter: enabled, loaded
    CONFIG_NF_NAT_IPV4: enabled, loaded
    CONFIG_NF_NAT_IPV6: enabled, loaded
    CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
    CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded
    CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded
    CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, loaded
    FUSE (for use with lxcfs): enabled, loaded

    --- Checkpoint/Restore ---
    checkpoint restore: enabled
    CONFIG_FHANDLE: enabled
    CONFIG_EVENTFD: enabled
    CONFIG_EPOLL: enabled
    CONFIG_UNIX_DIAG: enabled
    CONFIG_INET_DIAG: enabled
    CONFIG_PACKET_DIAG: enabled
    CONFIG_NETLINK_DIAG: enabled
    File capabilities:

    Note : Before booting a new kernel, you can check its configuration
    usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig

* ``uname -a``: Linux hive 4.19.0-5-amd64 #1 SMP Debian 4.19.37-3
  (2019-05-15) x86_64 GNU/Linux

* ``cat /proc/self/cgroup``::

    11:pids:/user.slice/user-1000.slice/session-4.scope
    10:devices:/user.slice
    9:freezer:/user/lxc/0
    8:rdma:/
    7:perf_event:/
    6:net_cls,net_prio:/
    5:blkio:/user.slice
    4:cpu,cpuacct:/user/lxc/0
    3:cpuset:/user/lxc/0
    2:memory:/user/lxc/0
    1:name=systemd:/user/lxc/0
    0::/user.slice/user-1000.slice/session-4.scope/user/lxc/0

* ``cat /proc/self/mountinfo``::

  20 25 0:19 / /sys rw,nosuid,nodev,noexec,relatime shared:7 - sysfs sysfs rw
  21 25 0:4 / /proc rw,relatime shared:14 - proc proc rw,hidepid=2
  22 25 0:6 / /dev rw,nosuid,relatime shared:2 - devtmpfs udev
rw,size=6134028k,nr_inodes=1533507,mode=755
  23 22 0:20 / /dev/pts rw,nosuid,noexec,relatime shared:3 - devpts devpts
rw,gid=5,mode=620,ptmxmode=000
  24 25 0:21 / /run rw,nosuid,noexec,relatime shared:5 - tmpfs tmpfs
rw,size=1229916k,mode=755
  25 0 0:22 / / rw,relatime shared:1 - btrfs /dev/sda4
rw,compress=lzo,space_cache,user_subvol_rm_allowed,subvolid=5,subvol=/
  26 20 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8 -
securityfs securityfs rw
  27 22 0:24 / /dev/shm rw,nosuid,nodev shared:4 - tmpfs tmpfs rw
  28 24 0:25 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6 - tmpfs
tmpfs rw,size=5120k
  29 20 0:26 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9 - tmpfs tmpfs
ro,mode=755
  30 29 0:27 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime
shared:10 - cgroup2 cgroup2 rw
  31 29 0:28 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime
shared:11 - cgroup cgroup rw,xattr,name=systemd
  32 20 0:29 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:12 -
pstore pstore rw
  33 20 0:30 / /sys/fs/bpf rw,nosuid,nodev,noexec,relatime shared:13 - bpf bpf
rw,mode=700
  34 29 0:31 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:15
- cgroup cgroup rw,memory
  35 29 0:32 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:16
- cgroup cgroup rw,cpuset,clone_children
  36 29 0:33 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime
shared:17 - cgroup cgroup rw,cpu,cpuacct
  37 29 0:34 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:18
- cgroup cgroup rw,blkio
  38 29 0:35 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime
shared:19 - cgroup cgroup rw,net_cls,net_prio
  39 29 0:36 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime
shared:20 - cgroup cgroup rw,perf_event
  40 29 0:37 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:21 -
cgroup cgroup rw,rdma
  41 29 0:38 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime
shared:22 - cgroup cgroup rw,freezer
  42 29 0:39 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime
shared:23 - cgroup cgroup rw,devices
  43 29 0:40 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:24 -
cgroup cgroup rw,pids
  45 22 0:18 / /dev/mqueue rw,relatime shared:25 - mqueue mqueue rw
  44 22 0:41 / /dev/hugepages rw,relatime shared:26 - hugetlbfs hugetlbfs
rw,pagesize=2M
  46 20 0:8 / /sys/kernel/debug rw,relatime shared:27 - debugfs debugfs rw
  47 21 0:42 / /proc/sys/fs/binfmt_misc rw,relatime shared:28 - autofs
systemd-1 rw,fd=41,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1678
  230 25 0:49 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:122 -
fuse.lxcfs lxcfs rw,user_id=0,group_id=0,allow_other
  245 20 0:50 / /sys/fs/fuse/connections rw,relatime shared:161 - fusectl
fusectl rw
  266 24 0:51 / /run/user/1000 rw,nosuid,nodev,relatime shared:169 - tmpfs
tmpfs rw,size=1229912k,mode=700,uid=1000,gid=1000


* ``grep cgfs /etc/pam.d/common-session*``::

	session optional pam_cgfs.so -c
freezer,memory,cpu,cpuset,cpuacct,unified,name=systemd
	session optional pam_cgfs.so -c
freezer,memory,cpu,cpuset,cpuacct,unified,name=systemd

container config
----------------

* ``cat rproxy/config``::

    lxc.include = /home/lxc/.config/lxc/common.conf
    lxc.uts.name = rproxy
    lxc.rootfs.path = btrfs:/home/lxc/rproxy/rootfs
    lxc.net.0.link = lxc-br-rproxy
    lxc.net.0.ipv6.address = fd00::2/16

* ``cat /home/lxc/.config/lxc/common.conf``

    lxc.include = /usr/share/lxc/config/common.conf
    lxc.include = /usr/share/lxc/config/userns.conf
    lxc.include = /etc/lxc/default.conf

    lxc.apparmor.profile = unconfined

    lxc.arch = x86_64
    lxc.start.auto = 1
    lxc.start.delay = 20

    lxc.net.0.type = veth
    lxc.net.0.name = eth0
    lxc.net.0.flags = up
    lxc.net.0.ipv6.gateway = auto

    lxc.idmap = u 0 165536 65536
    lxc.idmap = g 0 165536 65536

* ``/etc/lxc/default.conf``::

    lxc.net.0.type = empty
    lxc.apparmor.profile = generated
    lxc.apparmor.allow_nesting = 1

* ``cat /usr/share/lxc/config/userns.conf``

    lxc.cgroup.devices.deny =
    lxc.cgroup.devices.allow =

    lxc.cap.drop =
    lxc.cap.keep =

    lxc.tty.dir =

    lxc.mount.auto = sys:rw

* ``cat /usr/share/lxc/config/common.conf``::

    # Setup the LXC devices in /dev/lxc/
    lxc.tty.dir = lxc

    # Allow for 1024 pseudo terminals
    lxc.pty.max = 1024

    # Setup 4 tty devices
    lxc.tty.max = 4

    # Drop some harmful capabilities
    lxc.cap.drop = mac_admin mac_override sys_time sys_module sys_rawio

    # Ensure hostname is changed on clone
    lxc.hook.clone = /usr/share/lxc/hooks/clonehostname

    # CGroup whitelist
    lxc.cgroup.devices.deny = a
    ## Allow any mknod (but not reading/writing the node)
    lxc.cgroup.devices.allow = c *:* m
    lxc.cgroup.devices.allow = b *:* m
    ## Allow specific devices
    ### /dev/null
    lxc.cgroup.devices.allow = c 1:3 rwm
    ### /dev/zero
    lxc.cgroup.devices.allow = c 1:5 rwm
    ### /dev/full
    lxc.cgroup.devices.allow = c 1:7 rwm
    ### /dev/tty
    lxc.cgroup.devices.allow = c 5:0 rwm
    ### /dev/console
    lxc.cgroup.devices.allow = c 5:1 rwm
    ### /dev/ptmx
    lxc.cgroup.devices.allow = c 5:2 rwm
    ### /dev/random
    lxc.cgroup.devices.allow = c 1:8 rwm
    ### /dev/urandom
    lxc.cgroup.devices.allow = c 1:9 rwm
    ### /dev/pts/*
    lxc.cgroup.devices.allow = c 136:* rwm
    ### fuse
    lxc.cgroup.devices.allow = c 10:229 rwm

    lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed
    lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none
bind,optional 0 0

    lxc.seccomp.profile = /usr/share/lxc/config/common.seccomp

    lxc.include = /usr/share/lxc/config/common.conf.d/

* ``grep lxc /etc/sub{g,u}id``::

    /etc/subgid:lxc:165536:65536
    /etc/subuid:lxc:165536:65536

* ``umask``: 077

* I also tried this (overkill approach) to make the cgroups writable
  (I guess?) without success::

    for x in `find /sys/fs/cgroup -name lxc`; do
      echo; echo $x; chgrp -R lxc $x; chmod g+rw $x;
    done

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20190528/cc7bbb64/attachment.sig>


More information about the lxc-users mailing list