[lxc-users] Containers won't start under stretch-backport kernel reboot
Tony Lewis
tony at lewistribe.com
Sat Aug 11 06:23:54 UTC 2018
I have been running LXD/LXC on a stock Debian Stretch
kernel(4.9.0-7-amd64), and it's working fine. But it's beneficial for
me to go to the Stretch Backports (4.17.0-0.bpo.1-amd64). When I do, my
container's won't start until I forcefully kill the LXD daemon and
restart the service.
Details are below, and I'd appreciate some help figuring out what is
going wrong.
Details...
I was originally running LXD from packages then later migrated to snap.
It didn't go smoothly so there's a chance there is some package-related
cruft remaining. It seems there are only snap-related lxd binaries on
my system:
root at server:~# find /usr/ /lib /snap /var /bin /sbin -name lxd -type f
-print
/snap/lxd/7651/bin/lxd
/snap/lxd/7651/commands/lxd
/snap/lxd/7792/bin/lxd
/snap/lxd/7792/commands/lxd
/snap/lxd/8011/bin/lxd
/snap/lxd/8011/commands/lxd
root at server:~# lxd --version
3.3
root at server:~# lxc --version
3.3
root at server:~# dpkg -l | grep lx
ii libgl1-mesa-glx:amd64 13.0.6-1+b2 amd64
free implementation of the OpenGL API -- GLX runtime
ii libxcb-glx0:amd64 1.12-1 amd64 X C
Binding, glx extension
rc lxc-common 2.1.0-0ubuntu1~ubuntu17.04.1~ppa1 amd64 Linux
Containers userspace tools (common tools)
rc lxc1 2.1.0-0ubuntu1~ubuntu17.04.1~ppa1 amd64 Linux Containers
userspace tools
rc lxcfs 2.0.7-1 amd64 FUSE based
filesystem for LXC
rc lxd 2.18-0ubuntu5~ubuntu17.04.1~ppa1 amd64 Container
hypervisor based on LXC - daemon
Right after a reboot, LXD is running:
root at server:~# ps -ef | grep lx
root 1823 1 0 11:42 ? 00:00:00 /bin/sh
/snap/lxd/8011/commands/daemon.start
root 2436 1 0 11:42 ? 00:00:00 lxcfs
/var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root 2452 1823 5 11:42 ? 00:00:08 lxd --logfile
/var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root 2453 1823 1 11:42 ? 00:00:02 lxd waitready
root 2454 1823 0 11:42 ? 00:00:00 /bin/sh
/snap/lxd/8011/commands/daemon.start
lxd 2938 1 0 11:42 ? 00:00:00 dnsmasq --strict-order
--bind-interfaces
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid
--except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.1.99.1 --dhcp-no-override
--dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts
--dhcp-range 10.1.99.2,10.1.99.254,1h
--listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range
::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
lxd 3207 1 0 11:42 ? 00:00:00 dnsmasq --strict-order
--bind-interfaces
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid
--except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.1.100.1 --dhcp-no-override
--dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts
--dhcp-range 10.1.100.2,10.1.100.254,1h
--listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range
::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
root 4163 4040 0 11:45 pts/1 00:00:00 grep lx
There are two LXD-ish looking services:
root at server:~# systemctl list-units | grep lx
sys-devices-virtual-net-lxdnet0.device loaded active plugged
/sys/devices/virtual/net/lxdnet0
sys-devices-virtual-net-lxdnet1.device loaded active plugged
/sys/devices/virtual/net/lxdnet1
sys-subsystem-net-devices-lxdnet0.device loaded active plugged
/sys/subsystem/net/devices/lxdnet0
sys-subsystem-net-devices-lxdnet1.device loaded active plugged
/sys/subsystem/net/devices/lxdnet1
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-7651.mount loaded active mounted Mount unit for lxd
snap-lxd-7792.mount loaded active mounted Mount unit for lxd
snap-lxd-8011.mount loaded active mounted Mount unit for lxd
lxd.service loaded active exited LSB: Container hypervisor based on LXC
snap.lxd.daemon.service loaded active running Service for snap
application lxd.daemon
lxd.service is not running, but snap.lxd.daemon.service is:
root at server:~# systemctl status snap.lxd.daemon.service
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service;
enabled; vendor preset: enabled)
Active: active (running) since Sat 2018-08-11 11:42:31 AEST; 3min
22s ago
Main PID: 1823 (daemon.start)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/snap.lxd.daemon.service
‣ 1823 /bin/sh /snap/lxd/8011/commands/daemon.start
Aug 11 11:42:34 server snap[1823]: 2: fd: 8: perf_event
Aug 11 11:42:34 server snap[1823]: 3: fd: 9: blkio
Aug 11 11:42:34 server snap[1823]: 4: fd: 10: freezer
Aug 11 11:42:34 server snap[1823]: 5: fd: 11: devices
Aug 11 11:42:34 server snap[1823]: 6: fd: 12: cpu,cpuacct
Aug 11 11:42:34 server snap[1823]: 7: fd: 13: net_cls,net_prio
Aug 11 11:42:34 server snap[1823]: 8: fd: 14: memory
Aug 11 11:42:34 server snap[1823]: 9: fd: 15: name=systemd
Aug 11 11:42:35 server snap[1823]: lvl=warn msg="CGroup memory swap
accounting is disabled, swap limits will be ignored."
t=2018-08-11T01:42:35+0000
Aug 11 11:42:38 server snap[1823]: lvl=warn msg="Unable to update
backup.yaml at this time" name=backuptests t=2018-08-11T01:42:38+0000
root at server:~# systemctl status lxd.service
● lxd.service - LSB: Container hypervisor based on LXC
Loaded: loaded (/etc/init.d/lxd; generated; vendor preset: enabled)
Active: active (exited) since Sat 2018-08-11 11:42:24 AEST; 3min 49s ago
Docs: man:systemd-sysv-generator(8)
Process: 1412 ExecStart=/etc/init.d/lxd start (code=exited,
status=0/SUCCESS)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/lxd.service
Aug 11 11:42:24 server systemd[1]: Starting LSB: Container hypervisor
based on LXC...
Aug 11 11:42:24 server systemd[1]: Started LSB: Container hypervisor
based on LXC.
I can try stopping both services (though the first one reports as being
exited) the snap LXD service:
root at server:~# systemctl stop snap.lxd.daemon.service
root at server:~# systemctl stop lxd
root at server:~# systemctl stop snap.lxd.daemon.service
root at server:~# systemctl list-units | grep lx
sys-devices-virtual-net-lxdnet0.device loaded active plugged
/sys/devices/virtual/net/lxdnet0
sys-devices-virtual-net-lxdnet1.device loaded active plugged
/sys/devices/virtual/net/lxdnet1
sys-subsystem-net-devices-lxdnet0.device loaded active plugged
/sys/subsystem/net/devices/lxdnet0
sys-subsystem-net-devices-lxdnet1.device loaded active plugged
/sys/subsystem/net/devices/lxdnet1
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-7651.mount loaded active mounted Mount unit for lxd
snap-lxd-7792.mount loaded active mounted Mount unit for lxd
snap-lxd-8011.mount loaded active mounted Mount unit for lxd
root at server:~# systemctl status snap.lxd.daemon.service
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service;
enabled; vendor preset: enabled)
Active: inactive (dead) since Sat 2018-08-11 11:46:49 AEST; 25s ago
Process: 4304 ExecStop=/usr/bin/snap run --command=stop lxd.daemon
(code=exited, status=0/SUCCESS)
Process: 1823 ExecStart=/usr/bin/snap run lxd.daemon (code=killed,
signal=TERM)
Main PID: 1823 (code=killed, signal=TERM)
Aug 11 11:42:38 server snap[1823]: lvl=warn msg="Unable to update
backup.yaml at this time" name=backuptests t=2018-08-11T01:42:38+0000
Aug 11 11:46:48 server systemd[1]: Stopping Service for snap application
lxd.daemon...
Aug 11 11:46:48 server /usr/bin/snap[4304]: cmd.go:105: DEBUG:
restarting into "/snap/core/current/usr/bin/snap"
Aug 11 11:46:48 server snap[4320]: cmd.go:105: DEBUG: restarting into
"/snap/core/current/usr/bin/snap"
Aug 11 11:46:48 server snap[4304]: error: no changes found
Aug 11 11:46:48 server snap[4304]: => Stop reason is: host shutdown
Aug 11 11:46:48 server snap[4304]: => Stopping LXD (with container shutdown)
Aug 11 11:46:48 server snap[4304]: lxd: error while loading shared
libraries: liblxc.so.1: cannot open shared object file: No such file or
directory
Aug 11 11:46:48 server snap[4304]: => Stopping LXCFS
Aug 11 11:46:49 server systemd[1]: Stopped Service for snap application
lxd.daemon.
But LXD is still running:
root at server:~# ps -ef | grep lx
root 2452 1 4 11:42 ? 00:00:13 lxd --logfile
/var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root 2453 1 1 11:42 ? 00:00:03 lxd waitready
root 2454 1 0 11:42 ? 00:00:00 /bin/sh
/snap/lxd/8011/commands/daemon.start
lxd 2938 1 0 11:42 ? 00:00:00 dnsmasq --strict-order
--bind-interfaces
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid
--except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.1.99.1 --dhcp-no-override
--dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts
--dhcp-range 10.1.99.2,10.1.99.254,1h
--listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range
::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
lxd 3207 1 0 11:42 ? 00:00:00 dnsmasq --strict-order
--bind-interfaces
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid
--except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.1.100.1 --dhcp-no-override
--dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts
--dhcp-range 10.1.100.2,10.1.100.254,1h
--listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range
::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
root 4454 4040 0 11:47 pts/1 00:00:00 grep lx
I can list my containers, and with debug I can verify that comms with
the socket works. But if I attempt to manually start a container, that
command blocks and nothing happens.
root at server:~# lxc ls
+-------------+---------+------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+-------------+---------+------+------+------------+-----------+
| container1 | STOPPED | | | PERSISTENT | 0 |
+-------------+---------+------+------+------------+-----------+
| container2 | STOPPED | | | PERSISTENT | 0 |
+-------------+---------+------+------+------------+-----------+
| container3 | STOPPED | | | PERSISTENT | 0 |
+-------------+---------+------+------+------------+-----------+
| container4 | STOPPED | | | PERSISTENT | 0 |
+-------------+---------+------+------+------------+-----------+
| container5 | STOPPED | | | PERSISTENT | 0 |
+-------------+---------+------+------+------------+-----------+
If I manually kill the LXD process and restart the service with
systemctl, my containers automatically start in turn and I am good to go:
root at server:~# kill 2452
root at server:~# ps -ef | grep lx
lxd 2938 1 0 11:42 ? 00:00:00 dnsmasq --strict-order
--bind-interfaces
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.pid
--except-interface=lo --interface=lxdnet1 --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.1.99.1 --dhcp-no-override
--dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.hosts
--dhcp-range 10.1.99.2,10.1.99.254,1h
--listen-address=fd42:6727:ccbe:877f::1 --enable-ra --dhcp-range
::,constructor:lxdnet1,ra-stateless,ra-names -s lxd -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet1/dnsmasq.raw -u lxd
lxd 3207 1 0 11:42 ? 00:00:00 dnsmasq --strict-order
--bind-interfaces
--pid-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.pid
--except-interface=lo --interface=lxdnet0 --quiet-dhcp --quiet-dhcp6
--quiet-ra --listen-address=10.1.100.1 --dhcp-no-override
--dhcp-authoritative
--dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.leases
--dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.hosts
--dhcp-range 10.1.100.2,10.1.100.254,1h
--listen-address=fd42:5dd8:266d:cfea::1 --enable-ra --dhcp-range
::,constructor:lxdnet0,ra-stateless,ra-names -s lxd -S /lxd/
--conf-file=/var/snap/lxd/common/lxd/networks/lxdnet0/dnsmasq.raw -u lxd
root 4468 4040 0 11:47 pts/1 00:00:00 grep lx
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# systemctl start lxd
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# lxc ls
Error: Get http://unix.socket/1.0: dial unix
/var/snap/lxd/common/lxd/unix.socket: connect: connection refused
root at server:~# systemctl start snap.lxd.daemon
root at server:~# systemctl list-units | grep lx
sys-devices-virtual-net-lxdnet0.device loaded active plugged
/sys/devices/virtual/net/lxdnet0
sys-devices-virtual-net-lxdnet1.device loaded active plugged
/sys/devices/virtual/net/lxdnet1
sys-subsystem-net-devices-lxdnet0.device loaded active plugged
/sys/subsystem/net/devices/lxdnet0
sys-subsystem-net-devices-lxdnet1.device loaded active plugged
/sys/subsystem/net/devices/lxdnet1
run-snapd-ns-lxd.mnt.mount loaded active mounted /run/snapd/ns/lxd.mnt
snap-lxd-7651.mount loaded active mounted Mount unit for lxd
snap-lxd-7792.mount loaded active mounted Mount unit for lxd
snap-lxd-8011.mount loaded active mounted Mount unit for lxd
lxd.service loaded active exited LSB: Container hypervisor based on LXC
snap.lxd.daemon.service loaded active running Service for snap
application lxd.daemon
root at server:~# lxc ls
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| NAME | STATE | IPV4 |
IPV6 | TYPE | SNAPSHOTS |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container1 | RUNNING | 10.1.100.49 (eth0) |
fd42:5dd8:266d:cfea:216:3eff:fe17:904c (eth0) | PERSISTENT | 0 |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container2 | RUNNING | 10.1.100.182 (eth0) |
fd42:5dd8:266d:cfea:216:3eff:fe1c:91d0 (eth0) | PERSISTENT | 0 |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container3 | RUNNING | 10.1.100.56 (eth0) |
fd42:5dd8:266d:cfea:216:3eff:fec6:6816 (eth0) | PERSISTENT | 0 |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container4 | RUNNING | 10.1.100.209 (eth0) |
fd42:5dd8:266d:cfea:216:3eff:fe27:6f8f (eth0) | PERSISTENT | 0 |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
| container5 | RUNNING | 10.1.100.43 (eth0) |
fd42:5dd8:266d:cfea:216:3eff:fea4:a034 (eth0) | PERSISTENT | 0 |
+-------------+---------+-----------------------+-----------------------------------------------+------------+-----------+
More information about the lxc-users
mailing list