[lxc-users] Containers won't start under stretch-backport kernel reboot

Tony Lewis tony at lewistribe.com
Wed Aug 15 02:52:12 UTC 2018


Thank you Fajar.

I have made some progress, but would still value help (fresh detail at 
the bottom).


On 14/08/18 19:32, Fajar A. Nugraha wrote:
> On Tue, Aug 14, 2018 at 1:54 PM, Tony Lewis <tony at lewistribe.com 
> <mailto:tony at lewistribe.com>> wrote:
>
>     Apologies in advance for the bump, but does anyone have an
>     insights on this?
>
>
> Did you install lxd before using source instead of snap?

It turns out there were some residual config files left over from the 
package-based install.  No binaries, just various files in /etc.  I 
cleaned them up.

>
> What does /var/snap/lxd/common/lxd/logs/lxd.log say? Does it have any 
> error?

Not much of interest that I can see.  Here it is from a reboot today:

lvl=info msg="LXD 3.3 is starting in normal mode" 
path=/var/snap/lxd/common/lxd t=2018-08-15T01:25:20+0000
lvl=info msg="Kernel uid/gid map:" t=2018-08-15T01:25:20+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-08-15T01:25:20+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-08-15T01:25:20+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-08-15T01:25:20+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-08-15T01:25:20+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-08-15T01:25:20+0000
lvl=warn msg="CGroup memory swap accounting is disabled, swap limits 
will be ignored." t=2018-08-15T01:25:20+0000
lvl=info msg="Initializing local database" t=2018-08-15T01:25:20+0000
lvl=info msg="Initializing database gateway" t=2018-08-15T01:25:20+0000
address= id=1 lvl=info msg="Start database node" t=2018-08-15T01:25:20+0000
lvl=info msg="Raft: Restored from snapshot 1-23922-1534296032171" 
t=2018-08-15T01:25:20+0000
lvl=info msg="Raft: Initial configuration (index=1): [{Suffrage:Voter 
ID:1 Address:0}]" t=2018-08-15T01:25:20+0000
lvl=info msg="Raft: Node at 0 [Leader] entering Leader state" 
t=2018-08-15T01:25:20+0000
lvl=info msg="LXD isn't socket activated" t=2018-08-15T01:25:20+0000
lvl=info msg="Starting /dev/lxd handler:" t=2018-08-15T01:25:20+0000
lvl=info msg=" - binding devlxd socket" 
socket=/var/snap/lxd/common/lxd/devlxd/sock t=2018-08-15T01:25:20+0000
lvl=info msg="REST API daemon:" t=2018-08-15T01:25:20+0000
lvl=info msg=" - binding Unix socket" 
socket=/var/snap/lxd/common/lxd/unix.socket t=2018-08-15T01:25:20+0000
lvl=info msg="Initializing global database" t=2018-08-15T01:25:20+0000
lvl=info msg="Initializing storage pools" t=2018-08-15T01:25:21+0000
lvl=info msg="Initializing networks" t=2018-08-15T01:25:21+0000
lvl=info msg="Loading configuration" t=2018-08-15T01:25:22+0000
lvl=info msg="Connected to MAAS controller" t=2018-08-15T01:25:22+0000
lvl=info msg="Pruning expired images" t=2018-08-15T01:25:22+0000
lvl=info msg="Done pruning expired images" t=2018-08-15T01:25:22+0000
lvl=info msg="Updating instance types" t=2018-08-15T01:25:22+0000
lvl=info msg="Expiring log files" t=2018-08-15T01:25:22+0000
lvl=info msg="Done expiring log files" t=2018-08-15T01:25:22+0000
lvl=info msg="Updating images" t=2018-08-15T01:25:22+0000
lvl=info msg="Done updating images" t=2018-08-15T01:25:22+0000
lvl=warn msg="Unable to update backup.yaml at this time" 
name=backuptests t=2018-08-15T01:25:23+0000
lvl=info msg="Done updating instance types" t=2018-08-15T01:25:35+0000



>
> My GUESS is that you have /usr/bin/lxd and /snap/bin/lxd, which 
> interfere with each other. If that's not it, then my next guess is 
> that there's probably some group issue, like 
> https://github.com/lxc/lxd/issues/1861#issuecomment-206507631 . In any 
> case lxd.log might have more info.

Thank you.  There is no lxd binary anywhere other than three snap 
versions, and only one of those is running.  The old service was still 
there, now removed, but it was showing failure because there was no 
binary to start.

Progress:

I know that lxd is starting, but my containers still don't start. When I 
try to stop the service I see the following in systemctl:

# systemctl stop snap.lxd.daemon
# systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
    Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; 
enabled; vendor preset: enabled)
    Active: inactive (dead) since Wed 2018-08-15 11:40:51 AEST; 2s ago
   Process: 6761 ExecStop=/usr/bin/snap run --command=stop lxd.daemon 
(code=exited, status=0/SUCCESS)
   Process: 5432 ExecStart=/usr/bin/snap run lxd.daemon (code=killed, 
signal=TERM)
  Main PID: 5432 (code=killed, signal=TERM)

Aug 15 11:40:47 server systemd[1]: Stopping Service for snap application 
lxd.daemon...
Aug 15 11:40:47 server /usr/bin/snap[6761]: cmd.go:105: DEBUG: 
restarting into "/snap/core/current/usr/bin/snap"
Aug 15 11:40:47 server snap[6777]: cmd.go:105: DEBUG: restarting into 
"/snap/core/current/usr/bin/snap"
Aug 15 11:40:47 server snap[6761]: error: no changes found
Aug 15 11:40:50 server snap[6761]: => Stop reason is: host shutdown
Aug 15 11:40:50 server snap[6761]: => Stopping LXD (with container shutdown)
Aug 15 11:40:50 server snap[6761]: lxd: error while loading shared 
libraries: liblxc.so.1: cannot open shared object file: No such file or 
directory
Aug 15 11:40:50 server snap[6761]: => Stopping LXCFS
Aug 15 11:40:51 server snap[5432]: => LXD is ready
Aug 15 11:40:51 server systemd[1]: Stopped Service for snap application 
lxd.daemon.

A key line is: lxd: error while loading shared libraries: liblxc.so.1: 
cannot open shared object file: No such file or directory

The library is present in what looks to be the right places in the snap 
directories, but not anywhere else:

# find /snap -name liblxc.so.1 -print
/snap/lxd/7651/lib/liblxc.so.1
/snap/lxd/7792/lib/liblxc.so.1
/snap/lxd/8011/lib/liblxc.so.1

But when being launched, the daemon does not attempt to load from the 
snap directories:

# strace -f -F -etrace=file /usr/bin/snap run --command=stop lxd.daemon 
2>&1 | grep liblxc
[pid  4964] open("/lib/x86_64-linux-gnu/tls/x86_64/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/lib/x86_64-linux-gnu/tls/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/lib/x86_64-linux-gnu/x86_64/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/lib/x86_64-linux-gnu/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/usr/lib/x86_64-linux-gnu/tls/x86_64/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/usr/lib/x86_64-linux-gnu/tls/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/usr/lib/x86_64-linux-gnu/x86_64/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/usr/lib/x86_64-linux-gnu/liblxc.so.1", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid  4964] open("/lib/tls/x86_64/liblxc.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
[pid  4964] open("/lib/tls/liblxc.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(No such file or directory)
[pid  4964] open("/lib/x86_64/liblxc.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
[pid  4964] open("/lib/liblxc.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No 
such file or directory)
[pid  4964] open("/usr/lib/tls/x86_64/liblxc.so.1", O_RDONLY|O_CLOEXEC) 
= -1 ENOENT (No such file or directory)
[pid  4964] open("/usr/lib/tls/liblxc.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
[pid  4964] open("/usr/lib/x86_64/liblxc.so.1", O_RDONLY|O_CLOEXEC) = -1 
ENOENT (No such file or directory)
[pid  4964] open("/usr/lib/liblxc.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT 
(No such file or directory)
lxd: error while loading shared libraries: liblxc.so.1: cannot open 
shared object file: No such file or directory

Strangely, even if I copy /snap/lxd/8011/lib/liblxc.so.1 into /lib, the 
file is not found (strace reports no such file or directory).  I can't 
explain this, and I've checked and rechecked this.

If I kill the daemon itself, I can restart it using systemctl and my 
containers will start.  However I cannot gracefully stop containers (lxc 
stop <container> just hangs) nor can I gracefully stop lxd (same missing 
library error).

Any thoughts?

Tony

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20180815/4ae0c807/attachment.html>


More information about the lxc-users mailing list