[lxc-users] Containers won't start under stretch-backport kernel reboot

Tony Lewis tony at lewistribe.com
Tue Aug 14 06:54:43 UTC 2018


Apologies in advance for the bump, but does anyone have an insights on this?


On 11/08/18 16:23, Tony Lewis wrote:
> I have been running LXD/LXC on a stock Debian Stretch 
> kernel(4.9.0-7-amd64), and it's working fine.  But it's beneficial for 
> me to go to the Stretch Backports (4.17.0-0.bpo.1-amd64). When I do, 
> my container's won't start until I forcefully kill the LXD daemon and 
> restart the service.
>
> Details are below, and I'd appreciate some help figuring out what is 
> going wrong.
>
> Details...
>
> I was originally running LXD from packages then later migrated to 
> snap.  It didn't go smoothly so there's a chance there is some 
> package-related cruft remaining.  It seems there are only snap-related 
> lxd binaries on my system:
>
> root at server:~# find /usr/ /lib /snap /var /bin /sbin -name lxd -type f 
> -print
> /snap/lxd/7651/bin/lxd
> /snap/lxd/7651/commands/lxd
> /snap/lxd/7792/bin/lxd
> /snap/lxd/7792/commands/lxd
> /snap/lxd/8011/bin/lxd
> /snap/lxd/8011/commands/lxd
>
> root at server:~# lxd --version
> 3.3
> root at server:~# lxc --version
> 3.3
>
> root at server:~# dpkg -l | grep lx
> ii  libgl1-mesa-glx:amd64 13.0.6-1+b2 amd64        free implementation 
> of the OpenGL API -- GLX runtime
> ii  libxcb-glx0:amd64 1.12-1 amd64        X C Binding, glx extension
> rc  lxc-common 2.1.0-0ubuntu1~ubuntu17.04.1~ppa1 amd64 Linux 
> Containers userspace tools (common tools)
> rc  lxc1 2.1.0-0ubuntu1~ubuntu17.04.1~ppa1 amd64        Linux 
> Containers userspace tools
> rc  lxcfs 2.0.7-1                           amd64        FUSE based 
> filesystem for LXC
> rc  lxd 2.18-0ubuntu5~ubuntu17.04.1~ppa1  amd64        Container 
> hypervisor based on LXC - daemon
>
> Right after a reboot, LXD is running:
>
> root at server:~# ps -ef | grep lx
> root      1823     1  0 11:42 ?        00:00:00 /bin/sh 
> /snap/lxd/8011/commands/daemon.start
> root      2436     1  0 11:42 ?        00:00:00 lxcfs 
> /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
> root      2452  1823  5 11:42 ?        00:00:08 lxd --logfile 
> /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
> root      2453  1823  1 11:42 ?        00:00:02 lxd waitready
> root      2454  1823  0 11:42 ?        00:00:00 /bin/sh 
> /snap/lxd/8011/commands/daemon.start
> lxd       2938     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
> --bind-interfaces 
> --pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid 
> --except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6 
> --quiet-ra --listen-address=10.1.99.1 --dhcp-no-override 
> --dhcp-authoritative 
> --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases 
> --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts 
> --dhcp-range 10.1.99.2,10.1.99.254,1h 
> --listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range 
> ::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/ 
> --conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
> lxd       3207     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
> --bind-interfaces 
> --pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid 
> --except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6 
> --quiet-ra --listen-address=10.1.100.1 --dhcp-no-override 
> --dhcp-authoritative 
> --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases 
> --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts 
> --dhcp-range 10.1.100.2,10.1.100.254,1h 
> --listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range 
> ::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/ 
> --conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
> root      4163  4040  0 11:45 pts/1    00:00:00 grep lx
>
> There are two LXD-ish looking services:
>
> root at server:~# systemctl list-units | grep lx
> sys-devices-virtual-net-lxdnet0.device loaded active plugged 
> /sys/devices/virtual/net/lxdnet0
> sys-devices-virtual-net-lxdnet1.device loaded active plugged 
> /sys/devices/virtual/net/lxdnet1
> sys-subsystem-net-devices-lxdnet0.device loaded active plugged 
> /sys/subsystem/net/devices/lxdnet0
> sys-subsystem-net-devices-lxdnet1.device loaded active plugged 
> /sys/subsystem/net/devices/lxdnet1
> run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
> snap-lxd-7651.mount loaded active mounted   Mount unit for lxd
> snap-lxd-7792.mount loaded active mounted   Mount unit for lxd
> snap-lxd-8011.mount loaded active mounted   Mount unit for lxd
> lxd.service loaded active exited    LSB: Container hypervisor based on 
> LXC
> snap.lxd.daemon.service loaded active running   Service for snap 
> application lxd.daemon
>
> lxd.service is not running, but snap.lxd.daemon.service is:
>
> root at server:~# systemctl status snap.lxd.daemon.service
> ● snap.lxd.daemon.service - Service for snap application lxd.daemon
>    Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; 
> enabled; vendor preset: enabled)
>    Active: active (running) since Sat 2018-08-11 11:42:31 AEST; 3min 
> 22s ago
>  Main PID: 1823 (daemon.start)
>     Tasks: 0 (limit: 4915)
>    CGroup: /system.slice/snap.lxd.daemon.service
>            ‣ 1823 /bin/sh /snap/lxd/8011/commands/daemon.start
>
> Aug 11 11:42:34 server snap[1823]:   2: fd:   8: perf_event
> Aug 11 11:42:34 server snap[1823]:   3: fd:   9: blkio
> Aug 11 11:42:34 server snap[1823]:   4: fd:  10: freezer
> Aug 11 11:42:34 server snap[1823]:   5: fd:  11: devices
> Aug 11 11:42:34 server snap[1823]:   6: fd:  12: cpu,cpuacct
> Aug 11 11:42:34 server snap[1823]:   7: fd:  13: net_cls,net_prio
> Aug 11 11:42:34 server snap[1823]:   8: fd:  14: memory
> Aug 11 11:42:34 server snap[1823]:   9: fd:  15: name=systemd
> Aug 11 11:42:35 server snap[1823]: lvl=warn msg="CGroup memory swap 
> accounting is disabled, swap limits will be ignored." 
> t=2018-08-11T01:42:35+0000
> Aug 11 11:42:38 server snap[1823]: lvl=warn msg="Unable to update 
> backup.yaml at this time" name=backuptests t=2018-08-11T01:42:38+0000
>
> root at server:~# systemctl status lxd.service
> ● lxd.service - LSB: Container hypervisor based on LXC
>    Loaded: loaded (/etc/init.d/lxd; generated; vendor preset: enabled)
>    Active: active (exited) since Sat 2018-08-11 11:42:24 AEST; 3min 
> 49s ago
>      Docs: man:systemd-sysv-generator(8)
>   Process: 1412 ExecStart=/etc/init.d/lxd start (code=exited, 
> status=0/SUCCESS)
>     Tasks: 0 (limit: 4915)
>    CGroup: /system.slice/lxd.service
>
> Aug 11 11:42:24 server systemd[1]: Starting LSB: Container hypervisor 
> based on LXC...
> Aug 11 11:42:24 server systemd[1]: Started LSB: Container hypervisor 
> based on LXC.
>
> I can try stopping both services (though the first one reports as 
> being exited) the snap LXD service:
>
> root at server:~# systemctl stop snap.lxd.daemon.service
>
> root at server:~# systemctl stop lxd
>
> root at server:~# systemctl stop snap.lxd.daemon.service
>
> root at server:~# systemctl list-units | grep lx
> sys-devices-virtual-net-lxdnet0.device loaded active plugged 
> /sys/devices/virtual/net/lxdnet0
> sys-devices-virtual-net-lxdnet1.device loaded active plugged 
> /sys/devices/virtual/net/lxdnet1
> sys-subsystem-net-devices-lxdnet0.device loaded active plugged 
> /sys/subsystem/net/devices/lxdnet0
> sys-subsystem-net-devices-lxdnet1.device loaded active plugged 
> /sys/subsystem/net/devices/lxdnet1
> run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
> snap-lxd-7651.mount loaded active mounted   Mount unit for lxd
> snap-lxd-7792.mount loaded active mounted   Mount unit for lxd
> snap-lxd-8011.mount loaded active mounted   Mount unit for lxd
>
> root at server:~# systemctl status snap.lxd.daemon.service
> ● snap.lxd.daemon.service - Service for snap application lxd.daemon
>    Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; 
> enabled; vendor preset: enabled)
>    Active: inactive (dead) since Sat 2018-08-11 11:46:49 AEST; 25s ago
>   Process: 4304 ExecStop=/usr/bin/snap run --command=stop lxd.daemon 
> (code=exited, status=0/SUCCESS)
>   Process: 1823 ExecStart=/usr/bin/snap run lxd.daemon (code=killed, 
> signal=TERM)
>  Main PID: 1823 (code=killed, signal=TERM)
>
> Aug 11 11:42:38 server snap[1823]: lvl=warn msg="Unable to update 
> backup.yaml at this time" name=backuptests t=2018-08-11T01:42:38+0000
> Aug 11 11:46:48 server systemd[1]: Stopping Service for snap 
> application lxd.daemon...
> Aug 11 11:46:48 server /usr/bin/snap[4304]: cmd.go:105: DEBUG: 
> restarting into "/snap/core/current/usr/bin/snap"
> Aug 11 11:46:48 server snap[4320]: cmd.go:105: DEBUG: restarting into 
> "/snap/core/current/usr/bin/snap"
> Aug 11 11:46:48 server snap[4304]: error: no changes found
> Aug 11 11:46:48 server snap[4304]: => Stop reason is: host shutdown
> Aug 11 11:46:48 server snap[4304]: => Stopping LXD (with container 
> shutdown)
> Aug 11 11:46:48 server snap[4304]: lxd: error while loading shared 
> libraries: liblxc.so.1: cannot open shared object file: No such file 
> or directory
> Aug 11 11:46:48 server snap[4304]: => Stopping LXCFS
> Aug 11 11:46:49 server systemd[1]: Stopped Service for snap 
> application lxd.daemon.
>
> But LXD is still running:
>
> root at server:~# ps -ef | grep lx
> root      2452     1  4 11:42 ?        00:00:13 lxd --logfile 
> /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
> root      2453     1  1 11:42 ?        00:00:03 lxd waitready
> root      2454     1  0 11:42 ?        00:00:00 /bin/sh 
> /snap/lxd/8011/commands/daemon.start
> lxd       2938     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
> --bind-interfaces 
> --pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid 
> --except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6 
> --quiet-ra --listen-address=10.1.99.1 --dhcp-no-override 
> --dhcp-authoritative 
> --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases 
> --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts 
> --dhcp-range 10.1.99.2,10.1.99.254,1h 
> --listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range 
> ::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/ 
> --conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
> lxd       3207     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
> --bind-interfaces 
> --pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid 
> --except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6 
> --quiet-ra --listen-address=10.1.100.1 --dhcp-no-override 
> --dhcp-authoritative 
> --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases 
> --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts 
> --dhcp-range 10.1.100.2,10.1.100.254,1h 
> --listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range 
> ::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/ 
> --conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
> root      4454  4040  0 11:47 pts/1    00:00:00 grep lx
>
> I can list my containers, and with debug I can verify that comms with 
> the socket works.  But if I attempt to manually start a container, 
> that command blocks and nothing happens.
>
> root at server:~# lxc ls
> +-------------+---------+------+------+------------+-----------+
> |    NAME     |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
> +-------------+---------+------+------+------------+-----------+
> | container1  | STOPPED |      |      | PERSISTENT | 0         |
> +-------------+---------+------+------+------------+-----------+
> | container2  | STOPPED |      |      | PERSISTENT | 0         |
> +-------------+---------+------+------+------------+-----------+
> | container3  | STOPPED |      |      | PERSISTENT | 0         |
> +-------------+---------+------+------+------------+-----------+
> | container4  | STOPPED |      |      | PERSISTENT | 0         |
> +-------------+---------+------+------+------------+-----------+
> | container5  | STOPPED |      |      | PERSISTENT | 0         |
> +-------------+---------+------+------+------------+-----------+
>
> If I manually kill the LXD process and restart the service with 
> systemctl, my containers automatically start in turn and I am good to go:
>
> root at server:~# kill 2452
> root at server:~# ps -ef | grep lx
> lxd       2938     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
> --bind-interfaces 
> --pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid 
> --except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6 
> --quiet-ra --listen-address=10.1.99.1 --dhcp-no-override 
> --dhcp-authoritative 
> --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases 
> --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts 
> --dhcp-range 10.1.99.2,10.1.99.254,1h 
> --listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range 
> ::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/ 
> --conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
> lxd       3207     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
> --bind-interfaces 
> --pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid 
> --except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6 
> --quiet-ra --listen-address=10.1.100.1 --dhcp-no-override 
> --dhcp-authoritative 
> --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases 
> --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts 
> --dhcp-range 10.1.100.2,10.1.100.254,1h 
> --listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range 
> ::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/ 
> --conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
> root      4468  4040  0 11:47 pts/1    00:00:00 grep lx
> root at server:~# lxc ls
> Error: Get http://unix.socket/1.0: dial unix 
> /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
> root at server:~# systemctl start lxd
> root at server:~# lxc ls
> Error: Get http://unix.socket/1.0: dial unix 
> /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
> root at server:~# lxc ls
> Error: Get http://unix.socket/1.0: dial unix 
> /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
> root at server:~# lxc ls
> Error: Get http://unix.socket/1.0: dial unix 
> /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
> root at server:~# lxc ls
> Error: Get http://unix.socket/1.0: dial unix 
> /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
> root at server:~# systemctl start snap.lxd.daemon
> root at server:~# systemctl list-units | grep lx
> sys-devices-virtual-net-lxdnet0.device loaded active plugged 
> /sys/devices/virtual/net/lxdnet0
> sys-devices-virtual-net-lxdnet1.device loaded active plugged 
> /sys/devices/virtual/net/lxdnet1
> sys-subsystem-net-devices-lxdnet0.device loaded active plugged 
> /sys/subsystem/net/devices/lxdnet0
> sys-subsystem-net-devices-lxdnet1.device loaded active plugged 
> /sys/subsystem/net/devices/lxdnet1
> run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
> snap-lxd-7651.mount loaded active mounted   Mount unit for lxd
> snap-lxd-7792.mount loaded active mounted   Mount unit for lxd
> snap-lxd-8011.mount loaded active mounted   Mount unit for lxd
> lxd.service loaded active exited    LSB: Container hypervisor based on 
> LXC
> snap.lxd.daemon.service loaded active running   Service for snap 
> application lxd.daemon
>
> root at server:~# lxc ls
> +-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+ 
>
> |    NAME     |  STATE  |         IPV4 | IPV6                      
> |    TYPE    | SNAPSHOTS |
> +-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+ 
>
> | container1  | RUNNING | 10.1.100.49 (eth0)    | 
> fd42:5dd8:266d:cfea:216:3eff:fe17:904c (eth0) | PERSISTENT | 0         |
> +-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+ 
>
> | container2  | RUNNING | 10.1.100.182 (eth0)   | 
> fd42:5dd8:266d:cfea:216:3eff:fe1c:91d0 (eth0) | PERSISTENT | 0         |
> +-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+ 
>
> | container3  | RUNNING | 10.1.100.56 (eth0)    | 
> fd42:5dd8:266d:cfea:216:3eff:fec6:6816 (eth0) | PERSISTENT | 0         |
> +-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+ 
>
> | container4  | RUNNING | 10.1.100.209 (eth0)   | 
> fd42:5dd8:266d:cfea:216:3eff:fe27:6f8f (eth0) | PERSISTENT | 0         |
> +-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+ 
>
> | container5  | RUNNING | 10.1.100.43 (eth0)    | 
> fd42:5dd8:266d:cfea:216:3eff:fea4:a034 (eth0) | PERSISTENT | 0         |
> +-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+ 
>
>
>
>
>
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users



More information about the lxc-users mailing list