[lxc-users] lxc_monitor exiting, but not cleaning monitor-fifo?
Florian Klink
flokli at flokli.de
Tue Apr 1 20:15:25 UTC 2014
Am 01.04.2014 01:49, schrieb Dwight Engen:
> On Mon, 31 Mar 2014 23:18:13 +0200
> Florian Klink <flokli at flokli.de> wrote:
>
>> Am 31.03.2014 21:13, schrieb Dwight Engen:
>>> On Mon, 31 Mar 2014 20:34:15 +0200
>>> Florian Klink <flokli at flokli.de> wrote:
>>>
>>>> Am 31.03.2014 20:10, schrieb Dwight Engen:
>>>>> On Sat, 29 Mar 2014 23:39:33 +0100
>>>>> Florian Klink <flokli at flokli.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> when running multiple lxc actions in row using the command line
>>>>>> tools, I sometimes observe the following state:
>>>>>>
>>>>>>
>>>>>> - lxc-monitord is not running anymore
>>>>>> - /run/lxc/var/lib/lxc/monitor-fifo still exists, but is
>>>>>> "refusing connection"
>>>>>>
>>>>>> In the logs, I then see the following:
>>>>>>
>>>>>>
>>>>>> lxc-start 1395671045.703 ERROR lxc_monitor - connect : backing
>>>>>> off 10 lxc-start 1395671045.713 ERROR lxc_monitor - connect :
>>>>>> backing off 50 lxc-start 1395671045.763 ERROR lxc_monitor -
>>>>>> connect : backing off 100 lxc-start 1395671045.864 ERROR
>>>>>> lxc_monitor - connect : Connection refused
>>>>>>
>>>>>>
>>>>>> ... and the command fails.
>>>>>
>>>>> The only time I've seen this happen is if lxc-monitord is hard
>>>>> killed so it doesn't have a chance to clean up and remove the
>>>>> socket.
>>>>
>>>> Here, it's happening quite frequently. However, the script never
>>>> kills lxc-monitord on its own, it just tries to detect and fix
>>>> this state by removing the socket file...
>>>
>>> Right, removing the socket file makes it so another lxc-monitord
>>> will start, but the question is why is the first one exiting without
>>> cleaning up? Can you reliably reproduce it at will? If so then maybe
>>> you could attach an strace to lxc-monitord and see why it is
>>> exiting.
>>
>> I was so far not successful in reproducing the bug while having an
>> strace running. :-( But I'll continue to try!
Success :-) I managed to get an strace while trying to reproduce the
bug. I gzipped and attached it to this mail.
Its the output of strace -f -s 200 /usr/lib/lxc/lxc-monitord
/var/lib/lxc /run/lxc/var/lib/lxc/monitor-fifo &> strace_output.txt
I fired a bunch of lxc-starts and lxc-stops in row, then stopped my
script and waited for lxc-monitord (and strace too) to stop.
Then I started my script again and had the "leftover monitor-fifo state".
>>>
>>>>>
>>>>>>
>>>>>> A possible workaround would be checking for non-running
>>>>>> lxc-monitord process but existing monitor-fifo file then removing
>>>>>> the fifo if it exists before running the next lxc command, but
>>>>>> thats ugly ;-)
>>>>>
>>>>> Is there a good non-racy way to do this? I guess monitord could
>>>>> write its pid in $LXCPATH and we could kill(pid, 0) it.
>>
>> I also think that lxc should be able to recover from this problem
>> automatically.
>
> I agree, though I would like to understand the root cause. Can you try
> out the attached patch? I think it will cure your issues.
>
Thanks for the patch! Just tell me if you need more information for the
strace above. If not, I'll happily apply the patch :-)
>>>>>
>>>>>> Is this behaviour known? Is there some missing "cleanup code" in
>>>>>> lxc(_monitord) or why is it failing like this?
>>>>>
>>>>> Currently it catches SIGILL, SIGSEGV, SIGBUS, and SIGTERM and
>>>>> cleans up. Other than hard kill I'm not sure what else might
>>>>> cause it to exit without cleaning up.
>>>>
>>>> I shutdown containers with `lxc-stop -n container-name`
>>>> (lxc.stopsignal=30 (SIGPWR)), however this signal should never go
>>>> to lxc_monitord, right?
>>>
>>> Right, that goes to the init process of the container.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: strace_output.txt.gz
Type: application/gzip
Size: 2863 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140401/2cedaee3/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140401/2cedaee3/attachment.pgp>
More information about the lxc-users
mailing list