[lxc-users] lxc_monitor exiting, but not cleaning monitor-fifo?

Florian Klink flokli at flokli.de
Mon Mar 31 21:18:13 UTC 2014


Am 31.03.2014 21:13, schrieb Dwight Engen:
> On Mon, 31 Mar 2014 20:34:15 +0200
> Florian Klink <flokli at flokli.de> wrote:
> 
>> Am 31.03.2014 20:10, schrieb Dwight Engen:
>>> On Sat, 29 Mar 2014 23:39:33 +0100
>>> Florian Klink <flokli at flokli.de> wrote:
>>>
>>>> Hi,
>>>>
>>>> when running multiple lxc actions in row using the command line
>>>> tools, I sometimes observe the following state:
>>>>
>>>>
>>>> - lxc-monitord is not running anymore
>>>> - /run/lxc/var/lib/lxc/monitor-fifo still exists, but is "refusing
>>>> connection"
>>>>
>>>> In the logs, I then see the following:
>>>>
>>>>
>>>> lxc-start 1395671045.703 ERROR    lxc_monitor - connect : backing
>>>> off 10 lxc-start 1395671045.713 ERROR    lxc_monitor - connect :
>>>> backing off 50 lxc-start 1395671045.763 ERROR    lxc_monitor -
>>>> connect : backing off 100 lxc-start 1395671045.864 ERROR
>>>> lxc_monitor - connect : Connection refused
>>>>
>>>>
>>>> ... and the command fails.
>>>  
>>> The only time I've seen this happen is if lxc-monitord is hard
>>> killed so it doesn't have a chance to clean up and remove the
>>> socket.
>>
>> Here, it's happening quite frequently. However, the script never kills
>> lxc-monitord on its own, it just tries to detect and fix this state by
>> removing the socket file...
> 
> Right, removing the socket file makes it so another lxc-monitord will
> start, but the question is why is the first one exiting without
> cleaning up? Can you reliably reproduce it at will? If so then maybe
> you could attach an strace to lxc-monitord and see why it is exiting.

I was so far not successful in reproducing the bug while having an
strace running. :-( But I'll continue to try!
> 
>>>
>>>>
>>>> A possible workaround would be checking for non-running
>>>> lxc-monitord process but existing monitor-fifo file then removing
>>>> the fifo if it exists before running the next lxc command, but
>>>> thats ugly ;-)
>>>
>>> Is there a good non-racy way to do this? I guess monitord could
>>> write its pid in $LXCPATH and we could kill(pid, 0) it. 

I also think that lxc should be able to recover from this problem
automatically.

>>>  
>>>> Is this behaviour known? Is there some missing "cleanup code" in
>>>> lxc(_monitord) or why is it failing like this?
>>>  
>>> Currently it catches SIGILL, SIGSEGV, SIGBUS, and SIGTERM and cleans
>>> up. Other than hard kill I'm not sure what else might cause it to
>>> exit without cleaning up.
>>
>> I shutdown containers with `lxc-stop -n container-name`
>> (lxc.stopsignal=30 (SIGPWR)), however this signal should never go to
>> lxc_monitord, right?
> 
> Right, that goes to the init process of the container. 
>  
>>>
>>>> Florian
>>>>
>>>> _______________________________________________
>>>> lxc-users mailing list
>>>> lxc-users at lists.linuxcontainers.org
>>>> http://lists.linuxcontainers.org/listinfo/lxc-users
>>>
>>
>>
>> _______________________________________________
>> lxc-users mailing list
>> lxc-users at lists.linuxcontainers.org
>> http://lists.linuxcontainers.org/listinfo/lxc-users
> 




More information about the lxc-users mailing list