[lxc-users] How to recover from ERROR state
Kees Bakker
keesb at ghs.com
Tue Sep 11 13:54:30 UTC 2018
On 11-09-18 15:40, Christian Brauner wrote:
>> Kees Bakker <keesb at ghs.com> hat am 11. September 2018 um 15:13 geschrieben:
>>
>>
>> Hey,
>>
>> Every now and then we have one or more containers in state ERROR.
>> Is there a clever method to recover from that, other than
>> rebooting the LXD server?
>>
>> Killing the monitor and the forkstart does help. And also a kworker
>> process (kworker/u16:0) is eating up one of the CPUs with 100% load.
>> lxc info gives "error: Monitor is hung"
> If I'm not mistaken this is usually caused by a hanging lxc-monitord
> process which older LXC versions still use and which is removed in
> newer LXC versions.
> Can you check whether you see a lxc-monitord process when such a hang
> happens. If so, kill it. Afterwards things should work fine again.
Killing lxc-monitord did not help.
I had to kill a "[lxc monitor]" process as well. Then the container
got back to state "STOPPED".
But after trying to start the container again, the state went back
to "ERROR".
Meanwhile the kworker/u16:0 process continued at 100% load.
>> I'm running Ubuntu 16.04 with BTRFS. The kernel is 4.15.0-33-generic
>
> Cc stgraber since I don't have in mind what LXC version is used
> and if it is one that has already gotten rid of lxc-monitord.
ii lxc-common 2.0.8-0ubuntu1~16.04.2 amd64 Linux Containers userspace tools (common tools)
ii lxcfs 2.0.8-0ubuntu1~16.04.2 amd64 FUSE based filesystem for LXC
ii lxd 2.0.11-0ubuntu1~16.04.4 amd64 Container hypervisor based on LXC - daemon
ii lxd-client 2.0.11-0ubuntu1~16.04.4 amd64 Container hypervisor based on LXC - client
--
Kees Bakker
More information about the lxc-users
mailing list