[lxc-users] Can't start container after lxd/lxc/lxcfs upgrade

Fajar A. Nugraha list at fajar.net
Fri Mar 18 22:47:19 UTC 2016


On Sat, Mar 19, 2016 at 1:12 AM, B G <bg85305 at gmail.com> wrote:
> lxc => 2.0.0rc4
> lxd => 2.0.0rc4
> lxcfs => 2.0.0rc6
>
> After the latest upgrade to lxc/lxd tools existing and new containers fail
> to start, failing on the following stage from the container log:
>
> lxc 20160318161829.810 INFO     lxc_conf - conf.c:run_script_argv:367 -
> Executing script '/usr/share/lxcfs/lxc.mount.hook' for container
> 'testcontainer-20160311-0918', config section 'lxc'
> lxc 20160318161829.856 ERROR    lxc_conf - conf.c:run_buffer:347 - Script
> exited with status 1
> lxc 20160318161829.856 ERROR    lxc_conf - conf.c:lxc_setup:3750 - failed to
> run mount hooks for container 'testcontainer-20160311-0918'.
>
> There don't appear to be any logs or debug output from the lxc.mount.hook
> script that I can see that will help further.

I had to add my own debugging lines to figure out what's wrong


>
> LXC, LXD and LXCFS services are reported running by systemd.
>
> Any help greatly appreciated!


Somewhere close to the end of  lxc.mount.hook I setup debugging line
to see what the container's cgroup looks like. It shows this

+ ls -la /usr/lib/x86_64-linux-gnu/lxc/sys/fs/cgroup
total 0
drwxr-xr-x 12 root root 240 Mar 18 16:25 .
drwxr-xr-x  7 root root   0 Mar 18 16:15 ..
drwxr-xr-x  3 root root  60 Mar 18 16:25 blkio
drwxr-xr-x  3 root root  60 Mar 18 16:25 cpu
drwxr-xr-x  3 root root  60 Mar 18 16:25 cpuset
drwxr-xr-x  3 root root  60 Mar 18 16:25 devices
drwxr-xr-x  3 root root  60 Mar 18 16:25 freezer
drwxr-xr-x  3 root root  60 Mar 18 16:25 hugetlb
drwxr-xr-x  3 root root  60 Mar 18 16:25 memory
drwxr-xr-x  3 root root  60 Mar 18 16:25 net_cls
drwxr-xr-x  3 root root  60 Mar 18 16:25 perf_event
drwxr-xr-x  3 root root  60 Mar 18 16:25 systemd


That's probably where the bug lies. cpu and net_cls is already their
own directory. However lxc.mount.hook tries to create a symlink from
cpu,cpuset (which will be created and bind-mounted later) to cpu.
Since that directory already exist, it ended up trying to create
/usr/lib/x86_64-linux-gnu/lxc/sys/fs/cgroup/cpu/cpu symlink instead of
 /usr/lib/x86_64-linux-gnu/lxc/sys/fs/cgroup/cpu. Which fails.

I didn't see a relevant change to lxcfs (rc4->rc6) on the "create
symlink" behavior, so the bug is probably somewhere in lxc (?) that
creates "cpu" and "net_cls" cgroup inside the container.

My workaround:

# diff -Naru /usr/share/lxcfs/lxc.mount.hook.orig
/usr/share/lxcfs/lxc.mount.hook
--- /usr/share/lxcfs/lxc.mount.hook.orig        2016-03-18
07:32:48.000000000 +0700
+++ /usr/share/lxcfs/lxc.mount.hook     2016-03-18 16:26:33.633345802 +0700
@@ -51,7 +51,13 @@
                 for single in $arr
                 do
                     if [ ! -L ${LXC_ROOTFS_MOUNT}/sys/fs/cgroup/$single ]; then
-                        ln -s $DEST ${LXC_ROOTFS_MOUNT}/sys/fs/cgroup/$single
+                        if [ -d
${LXC_ROOTFS_MOUNT}/sys/fs/cgroup/$single ]; then
+                            # a cgroup is already mounted there. Just
bind-mount ours
+                            mount -n --bind $entry
${LXC_ROOTFS_MOUNT}/sys/fs/cgroup/$single
+                        else
+                            # I can simply create a symlink
+                            ln -s $DEST
${LXC_ROOTFS_MOUNT}/sys/fs/cgroup/$single
+                        fi
                     fi
                 done
             fi


The comments speak for themselves. That at least allows the container
to start while waiting for the devs to come up with a proper fix. The
container ended up with a cgroup directory like this:

# ls -la /sys/fs/cgroup/
total 0
drwxr-xr-x 14 root root 320 Mar 18 16:43 .
drwxr-xr-x  7 root root   0 Mar 18 16:43 ..
drwxr-xr-x  3 root root  60 Mar 18 16:43 blkio
drwxr-xr-x  2 root root   0 Mar 19 05:39 cpu
drwxr-xr-x  2 root root   0 Mar 19 05:39 cpu,cpuacct
lrwxrwxrwx  1 root root  11 Mar 18 16:43 cpuacct -> cpu,cpuacct
drwxr-xr-x  3 root root  60 Mar 18 16:43 cpuset
drwxr-xr-x  3 root root  60 Mar 18 16:43 devices
drwxr-xr-x  3 root root  60 Mar 18 16:43 freezer
drwxr-xr-x  3 root root  60 Mar 18 16:43 hugetlb
drwxr-xr-x  3 root root  60 Mar 18 16:43 memory
drwxr-xr-x  2 root root   0 Mar 19 05:39 net_cls
drwxr-xr-x  2 root root   0 Mar 19 05:39 net_cls,net_prio
lrwxrwxrwx  1 root root  16 Mar 18 16:43 net_prio -> net_cls,net_prio
drwxr-xr-x  3 root root  60 Mar 18 16:43 perf_event
drwxr-xr-x  3 root root  60 Mar 18 16:43 systemd

Note how "cpu" is a directory, but "cpuacct" is a symlink. The same
goes for net_cls and net_prio.

-- 
Fajar


More information about the lxc-users mailing list