[lxc-users] Containers won't start under stretch-backport kernel reboot

Tony Lewis tony at lewistribe.com
Sat Aug 11 06:23:54 UTC 2018


I have been running LXD/LXC on a stock Debian Stretch 
kernel(4.9.0-7-amd64), and it's working fine.  But it's beneficial for 
me to go to the Stretch Backports (4.17.0-0.bpo.1-amd64). When I do, my 
container's won't start until I forcefully kill the LXD daemon and 
restart the service.

Details are below, and I'd appreciate some help figuring out what is 
going wrong.

Details...

I was originally running LXD from packages then later migrated to snap.  
It didn't go smoothly so there's a chance there is some package-related 
cruft remaining.  It seems there are only snap-related lxd binaries on 
my system:

root at server:~# find /usr/ /lib /snap /var /bin /sbin -name lxd -type f 
-print
/snap/lxd/7651/bin/lxd
/snap/lxd/7651/commands/lxd
/snap/lxd/7792/bin/lxd
/snap/lxd/7792/commands/lxd
/snap/lxd/8011/bin/lxd
/snap/lxd/8011/commands/lxd

root at server:~# lxd --version
3.3
root at server:~# lxc --version
3.3

root at server:~# dpkg -l | grep lx
ii  libgl1-mesa-glx:amd64 13.0.6-1+b2                       amd64        
free implementation of the OpenGL API -- GLX runtime
ii  libxcb-glx0:amd64 1.12-1                            amd64        X C 
Binding, glx extension
rc  lxc-common 2.1.0-0ubuntu1~ubuntu17.04.1~ppa1 amd64        Linux 
Containers userspace tools (common tools)
rc  lxc1 2.1.0-0ubuntu1~ubuntu17.04.1~ppa1 amd64        Linux Containers 
userspace tools
rc  lxcfs 2.0.7-1                           amd64        FUSE based 
filesystem for LXC
rc  lxd 2.18-0ubuntu5~ubuntu17.04.1~ppa1  amd64        Container 
hypervisor based on LXC - daemon

Right after a reboot, LXD is running:

root at server:~# ps -ef | grep lx
root      1823     1  0 11:42 ?        00:00:00 /bin/sh 
/snap/lxd/8011/commands/daemon.start
root      2436     1  0 11:42 ?        00:00:00 lxcfs 
/var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root      2452  1823  5 11:42 ?        00:00:08 lxd --logfile 
/var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      2453  1823  1 11:42 ?        00:00:02 lxd waitready
root      2454  1823  0 11:42 ?        00:00:00 /bin/sh 
/snap/lxd/8011/commands/daemon.start
lxd       2938     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
--bind-interfaces 
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid 
--except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6 
--quiet-ra --listen-address=10.1.99.1 --dhcp-no-override 
--dhcp-authoritative 
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases 
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts 
--dhcp-range 10.1.99.2,10.1.99.254,1h 
--listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range 
::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/ 
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
lxd       3207     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
--bind-interfaces 
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid 
--except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6 
--quiet-ra --listen-address=10.1.100.1 --dhcp-no-override 
--dhcp-authoritative 
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases 
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts 
--dhcp-range 10.1.100.2,10.1.100.254,1h 
--listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range 
::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/ 
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
root      4163  4040  0 11:45 pts/1    00:00:00 grep lx

There are two LXD-ish looking services:

root at server:~# systemctl list-units | grep lx
sys-devices-virtual-net-lxdnet0.device loaded active plugged 
/sys/devices/virtual/net/lxdnet0
sys-devices-virtual-net-lxdnet1.device loaded active plugged 
/sys/devices/virtual/net/lxdnet1
sys-subsystem-net-devices-lxdnet0.device loaded active plugged 
/sys/subsystem/net/devices/lxdnet0
sys-subsystem-net-devices-lxdnet1.device loaded active plugged 
/sys/subsystem/net/devices/lxdnet1
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-7651.mount loaded active mounted   Mount unit for lxd
snap-lxd-7792.mount loaded active mounted   Mount unit for lxd
snap-lxd-8011.mount loaded active mounted   Mount unit for lxd
lxd.service loaded active exited    LSB: Container hypervisor based on LXC
snap.lxd.daemon.service loaded active running   Service for snap 
application lxd.daemon

lxd.service is not running, but snap.lxd.daemon.service is:

root at server:~# systemctl status snap.lxd.daemon.service
● snap.lxd.daemon.service - Service for snap application lxd.daemon
    Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; 
enabled; vendor preset: enabled)
    Active: active (running) since Sat 2018-08-11 11:42:31 AEST; 3min 
22s ago
  Main PID: 1823 (daemon.start)
     Tasks: 0 (limit: 4915)
    CGroup: /system.slice/snap.lxd.daemon.service
            ‣ 1823 /bin/sh /snap/lxd/8011/commands/daemon.start

Aug 11 11:42:34 server snap[1823]:   2: fd:   8: perf_event
Aug 11 11:42:34 server snap[1823]:   3: fd:   9: blkio
Aug 11 11:42:34 server snap[1823]:   4: fd:  10: freezer
Aug 11 11:42:34 server snap[1823]:   5: fd:  11: devices
Aug 11 11:42:34 server snap[1823]:   6: fd:  12: cpu,cpuacct
Aug 11 11:42:34 server snap[1823]:   7: fd:  13: net_cls,net_prio
Aug 11 11:42:34 server snap[1823]:   8: fd:  14: memory
Aug 11 11:42:34 server snap[1823]:   9: fd:  15: name=systemd
Aug 11 11:42:35 server snap[1823]: lvl=warn msg="CGroup memory swap 
accounting is disabled, swap limits will be ignored." 
t=2018-08-11T01:42:35+0000
Aug 11 11:42:38 server snap[1823]: lvl=warn msg="Unable to update 
backup.yaml at this time" name=backuptests t=2018-08-11T01:42:38+0000

root at server:~# systemctl status lxd.service
● lxd.service - LSB: Container hypervisor based on LXC
    Loaded: loaded (/etc/init.d/lxd; generated; vendor preset: enabled)
    Active: active (exited) since Sat 2018-08-11 11:42:24 AEST; 3min 49s ago
      Docs: man:systemd-sysv-generator(8)
   Process: 1412 ExecStart=/etc/init.d/lxd start (code=exited, 
status=0/SUCCESS)
     Tasks: 0 (limit: 4915)
    CGroup: /system.slice/lxd.service

Aug 11 11:42:24 server systemd[1]: Starting LSB: Container hypervisor 
based on LXC...
Aug 11 11:42:24 server systemd[1]: Started LSB: Container hypervisor 
based on LXC.

I can try stopping both services (though the first one reports as being 
exited) the snap LXD service:

root at server:~# systemctl stop snap.lxd.daemon.service

root at server:~# systemctl stop lxd

root at server:~# systemctl stop snap.lxd.daemon.service

root at server:~# systemctl list-units | grep lx
sys-devices-virtual-net-lxdnet0.device loaded active plugged 
/sys/devices/virtual/net/lxdnet0
sys-devices-virtual-net-lxdnet1.device loaded active plugged 
/sys/devices/virtual/net/lxdnet1
sys-subsystem-net-devices-lxdnet0.device loaded active plugged 
/sys/subsystem/net/devices/lxdnet0
sys-subsystem-net-devices-lxdnet1.device loaded active plugged 
/sys/subsystem/net/devices/lxdnet1
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-7651.mount loaded active mounted   Mount unit for lxd
snap-lxd-7792.mount loaded active mounted   Mount unit for lxd
snap-lxd-8011.mount loaded active mounted   Mount unit for lxd

root at server:~# systemctl status snap.lxd.daemon.service
● snap.lxd.daemon.service - Service for snap application lxd.daemon
    Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; 
enabled; vendor preset: enabled)
    Active: inactive (dead) since Sat 2018-08-11 11:46:49 AEST; 25s ago
   Process: 4304 ExecStop=/usr/bin/snap run --command=stop lxd.daemon 
(code=exited, status=0/SUCCESS)
   Process: 1823 ExecStart=/usr/bin/snap run lxd.daemon (code=killed, 
signal=TERM)
  Main PID: 1823 (code=killed, signal=TERM)

Aug 11 11:42:38 server snap[1823]: lvl=warn msg="Unable to update 
backup.yaml at this time" name=backuptests t=2018-08-11T01:42:38+0000
Aug 11 11:46:48 server systemd[1]: Stopping Service for snap application 
lxd.daemon...
Aug 11 11:46:48 server /usr/bin/snap[4304]: cmd.go:105: DEBUG: 
restarting into "/snap/core/current/usr/bin/snap"
Aug 11 11:46:48 server snap[4320]: cmd.go:105: DEBUG: restarting into 
"/snap/core/current/usr/bin/snap"
Aug 11 11:46:48 server snap[4304]: error: no changes found
Aug 11 11:46:48 server snap[4304]: => Stop reason is: host shutdown
Aug 11 11:46:48 server snap[4304]: => Stopping LXD (with container shutdown)
Aug 11 11:46:48 server snap[4304]: lxd: error while loading shared 
libraries: liblxc.so.1: cannot open shared object file: No such file or 
directory
Aug 11 11:46:48 server snap[4304]: => Stopping LXCFS
Aug 11 11:46:49 server systemd[1]: Stopped Service for snap application 
lxd.daemon.

But LXD is still running:

root at server:~# ps -ef | grep lx
root      2452     1  4 11:42 ?        00:00:13 lxd --logfile 
/var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root      2453     1  1 11:42 ?        00:00:03 lxd waitready
root      2454     1  0 11:42 ?        00:00:00 /bin/sh 
/snap/lxd/8011/commands/daemon.start
lxd       2938     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
--bind-interfaces 
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid 
--except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6 
--quiet-ra --listen-address=10.1.99.1 --dhcp-no-override 
--dhcp-authoritative 
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases 
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts 
--dhcp-range 10.1.99.2,10.1.99.254,1h 
--listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range 
::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/ 
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
lxd       3207     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
--bind-interfaces 
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid 
--except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6 
--quiet-ra --listen-address=10.1.100.1 --dhcp-no-override 
--dhcp-authoritative 
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases 
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts 
--dhcp-range 10.1.100.2,10.1.100.254,1h 
--listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range 
::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/ 
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
root      4454  4040  0 11:47 pts/1    00:00:00 grep lx

I can list my containers, and with debug I can verify that comms with 
the socket works.  But if I attempt to manually start a container, that 
command blocks and nothing happens.

root at server:~# lxc ls
+-------------+---------+------+------+------------+-----------+
|    NAME     |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+-------------+---------+------+------+------------+-----------+
| container1  | STOPPED |      |      | PERSISTENT | 0         |
+-------------+---------+------+------+------------+-----------+
| container2  | STOPPED |      |      | PERSISTENT | 0         |
+-------------+---------+------+------+------------+-----------+
| container3  | STOPPED |      |      | PERSISTENT | 0         |
+-------------+---------+------+------+------------+-----------+
| container4  | STOPPED |      |      | PERSISTENT | 0         |
+-------------+---------+------+------+------------+-----------+
| container5  | STOPPED |      |      | PERSISTENT | 0         |
+-------------+---------+------+------+------------+-----------+

If I manually kill the LXD process and restart the service with 
systemctl, my containers automatically start in turn and I am good to go:

root at server:~# kill 2452
root at server:~# ps -ef | grep lx
lxd       2938     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
--bind-interfaces 
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid 
--except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6 
--quiet-ra --listen-address=10.1.99.1 --dhcp-no-override 
--dhcp-authoritative 
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases 
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts 
--dhcp-range 10.1.99.2,10.1.99.254,1h 
--listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range 
::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/ 
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
lxd       3207     1  0 11:42 ?        00:00:00 dnsmasq --strict-order 
--bind-interfaces 
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid 
--except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6 
--quiet-ra --listen-address=10.1.100.1 --dhcp-no-override 
--dhcp-authoritative 
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases 
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts 
--dhcp-range 10.1.100.2,10.1.100.254,1h 
--listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range 
::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/ 
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
root      4468  4040  0 11:47 pts/1    00:00:00 grep lx
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix 
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# systemctl start lxd
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix 
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix 
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix 
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix 
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# systemctl start snap.lxd.daemon
root at server:~# systemctl list-units | grep lx
sys-devices-virtual-net-lxdnet0.device loaded active plugged 
/sys/devices/virtual/net/lxdnet0
sys-devices-virtual-net-lxdnet1.device loaded active plugged 
/sys/devices/virtual/net/lxdnet1
sys-subsystem-net-devices-lxdnet0.device loaded active plugged 
/sys/subsystem/net/devices/lxdnet0
sys-subsystem-net-devices-lxdnet1.device loaded active plugged 
/sys/subsystem/net/devices/lxdnet1
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-7651.mount loaded active mounted   Mount unit for lxd
snap-lxd-7792.mount loaded active mounted   Mount unit for lxd
snap-lxd-8011.mount loaded active mounted   Mount unit for lxd
lxd.service loaded active exited    LSB: Container hypervisor based on LXC
snap.lxd.daemon.service loaded active running   Service for snap 
application lxd.daemon

root at server:~# lxc ls
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
|    NAME     |  STATE  |         IPV4 |                     
IPV6                      |    TYPE    | SNAPSHOTS |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container1  | RUNNING | 10.1.100.49 (eth0)    | 
fd42:5dd8:266d:cfea:216:3eff:fe17:904c (eth0) | PERSISTENT | 0         |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container2  | RUNNING | 10.1.100.182 (eth0)   | 
fd42:5dd8:266d:cfea:216:3eff:fe1c:91d0 (eth0) | PERSISTENT | 0         |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container3  | RUNNING | 10.1.100.56 (eth0)    | 
fd42:5dd8:266d:cfea:216:3eff:fec6:6816 (eth0) | PERSISTENT | 0         |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container4  | RUNNING | 10.1.100.209 (eth0)   | 
fd42:5dd8:266d:cfea:216:3eff:fe27:6f8f (eth0) | PERSISTENT | 0         |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container5  | RUNNING | 10.1.100.43 (eth0)    | 
fd42:5dd8:266d:cfea:216:3eff:fea4:a034 (eth0) | PERSISTENT | 0         |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+






More information about the lxc-users mailing list