[lxc-users] How to recover from ERROR state

Kees Bakker keesb at ghs.com
Tue Sep 11 13:54:30 UTC 2018


On 11-09-18 15:40, Christian Brauner wrote:
>> Kees Bakker <keesb at ghs.com> hat am 11. September 2018 um 15:13 geschrieben:
>>
>>
>> Hey,
>>
>> Every now and then we have one or more containers in state ERROR.
>> Is there a clever method to recover from that, other than
>> rebooting the LXD server?
>>
>> Killing the monitor and the forkstart does help. And also a kworker
>> process (kworker/u16:0) is eating up one of the CPUs with 100% load.
>> lxc info gives "error: Monitor is hung"
> If I'm not mistaken this is usually caused by a hanging lxc-monitord
> process which older LXC versions still use and which is removed in 
> newer LXC versions.
> Can you check whether you see a lxc-monitord process when such a hang
> happens. If so, kill it. Afterwards things should work fine again.

Killing lxc-monitord did not help.
I had to kill a "[lxc monitor]" process as well. Then the container
got back to state "STOPPED".

But after trying to start the container again, the state went back
to "ERROR".

Meanwhile the kworker/u16:0 process continued at 100% load.

>> I'm running Ubuntu 16.04 with BTRFS. The kernel is 4.15.0-33-generic
>
> Cc stgraber since I don't have in mind what LXC version is used
> and if it is one that has already gotten rid of lxc-monitord.

ii  lxc-common     2.0.8-0ubuntu1~16.04.2  amd64        Linux Containers userspace tools (common tools)
ii  lxcfs          2.0.8-0ubuntu1~16.04.2  amd64        FUSE based filesystem for LXC
ii  lxd            2.0.11-0ubuntu1~16.04.4 amd64        Container hypervisor based on LXC - daemon
ii  lxd-client     2.0.11-0ubuntu1~16.04.4 amd64        Container hypervisor based on LXC - client

-- 
Kees Bakker


More information about the lxc-users mailing list