[lxc-users] LXD cluster is unresponsible: all lxc related commands hangs

Stéphane Graber stgraber at ubuntu.com
Fri Aug 17 16:07:21 UTC 2018


That's somewhat unlikely though it's hard to tell post-reboot.

It could be that something upset the kernel (zfs has been known to do
that sometimes) and LXD couldn't be killed anymore as it was stuck on
the kernel.

If this happens to you again, try recording the following before rebooting:
 - journalctl -u snap.lxd.daemon -n300
 - dmesg
 - ps fauxww

This will usually be enough to determine what was at fault.

Stéphane

On Fri, Aug 17, 2018 at 11:58:35PM +0800, ronkaluta wrote:
> I tried snap refresh alone but did not fix the problem.
> 
> Could it be something involved with the kernel update
> 
> linux-image-4.15.0-32?
> 
> 
> On Friday, August 17, 2018 11:56 PM, Stéphane Graber wrote:
> > Apt upgrades shouldn't really be needed, though certainly good to make
> > sure the rest of the system stays up to date :)
> > 
> > The most important part is ensuring that all cluster nodes run the same
> > version of LXD (3.4 in this case), once they all do, the cluster should
> > allow queries.
> > 
> > This upgrade procedure isn't so great and we're well aware of it.
> > I'll open a Github issue to track some improvements we should be making
> > to make such upgrades much more seamless.
> > 
> > On Fri, Aug 17, 2018 at 11:26:08PM +0800, ronkaluta wrote:
> > > I just had roughly the same problem.
> > > 
> > > The way I cured it was to snap refresh
> > > 
> > > then sudo apt update and then
> > > 
> > > sudo apt upgrade
> > > 
> > > current lxd snap is 3.4
> > > 
> > > current linux-image-4.15.0-32-generic
> > > 
> > > I then rebooted.
> > > 
> > > (same procedure on all machines)
> > > 
> > > 
> > > On Friday, August 17, 2018 10:46 PM, Stéphane Graber wrote:
> > > > On Fri, Aug 17, 2018 at 01:20:42PM +0300, Andriy Tovstik wrote:
> > > > > Hi, all!
> > > > > 
> > > > > Some time ago I installed a dual node LXD cluster. Today I logged in to the
> > > > > node and tried to execute
> > > > > lxc exec container -- bash
> > > > > but command hanged.
> > > > > Also, all lxc commands are unresponsible: i'm not able to interact with my
> > > > > cluster and my containers.
> > > > > I tried to restart snap.lxd.daemon but it didn't help. journalctl -u
> > > > > snap.lxd.daemon - in attachment.
> > > > > 
> > > > > Any suggestion?
> > > > Are both nodes running the same snap revision according to `snap list`?
> > > > 
> > > > LXD cluster nodes must all run the exact same version, otherwise they
> > > > effectively wait until this becomes the case before they start replying
> > > > to API queries.
> > 
> > 
> > 
> > _______________________________________________
> > lxc-users mailing list
> > lxc-users at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-users
> 

> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users


-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20180817/cce7e8b8/attachment.sig>


More information about the lxc-users mailing list