[lxc-users] How to recover from ERROR state

Christian Brauner christian at brauner.io
Mon Sep 24 13:30:17 UTC 2018


On Mon, Sep 24, 2018 at 02:11:38PM +0200, Christian Brauner wrote:
> On Mon, Sep 24, 2018, 14:03 Kees Bakker <keesb at ghs.com> wrote:
> 
> > Same question again: what is the best approach to recover
> > from a container in an ERROR state?

So another thing I would like to see is the current stack of the hung
monitor process. Could you please paste (or send privately) the output
of:

cat /proc/<pid-of-hung-monitor-process>/stack

Also, in what state is the monitor hung. Again in D state?

Christian

> >
> 
> Please show me the dmesg output. If it is a kernel bug you're hitting
> there's nothing that LXD can do to help you.
> 
> 
> > This time it happened with Ubuntu 18.04 and LVM storage.
> >
> > The steps leading to this were as follows. It's just an FYI, I don't think
> > it
> > really matters, except for the stop and start.
> >
> >   lvextend -L 20G local/containers_xyz
> >   resize2fs /dev/local/containers_xyz
> >   lxc stop xyz
> >   e2fsck -f /dev/local/containers_
> >   lxc start xyz
> >
> > ... the start command hanged.
> >
> > Some output os ps auxfwww
> >
> > root      6224  0.0  0.0  22912  4096 pts/1    S    sep06   0:00
> > |               \_ -bash
> > root     20900  0.0  0.0 1136140 12092 pts/1   Sl+  12:19   0:00
> > |                   \_ lxc start xyz
> > --
> > root     18157  3.5  4.2 5581444 1398904 ?     Ssl  sep12 611:36
> > /usr/lib/lxd/lxd --group lxd --logfile=/var/log/lxd/lxd.log
> > root     20918  0.0  0.0 521720 19780 ?        Sl   12:19   0:00  \_
> > /usr/lib/lxd/lxd forkstart xyz /var/lib/lxd/containers
> > /var/log/lxd/xyz/lxc.conf
> > root     20925  0.0  0.0      0     0 ?        Z    12:19   0:00      \_
> > [lxd] <defunct>
> > --
> > root     20926  0.0  0.0 530432  7280 ?        Ss   12:19   0:00 [lxc
> > monitor] /var/lib/lxd/containers xyz
> > root     20943  0.0  0.0 530432  3484 ?        D    12:19   0:00  \_ [lxc
> > monitor] /var/lib/lxd/containers xyz
> >
> >
> >
> > On 11-09-18 15:13, Kees Bakker wrote:
> > > Hey,
> > >
> > > Every now and then we have one or more containers in state ERROR.
> > > Is there a clever method to recover from that, other than
> > > rebooting the LXD server?
> > >
> > > Killing the monitor and the forkstart does help. And also a kworker
> > > process (kworker/u16:0) is eating up one of the CPUs with 100% load.
> > > lxc info gives "error: Monitor is hung"
> > >
> > > I'm running Ubuntu 16.04 with BTRFS. The kernel is 4.15.0-33-generic
> >
> > _______________________________________________
> > lxc-users mailing list
> > lxc-users at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-users


More information about the lxc-users mailing list