[lxc-users] lxc_monitor exiting, but not cleaning monitor-fifo?

Dwight Engen dwight.engen at oracle.com
Mon Mar 31 23:49:00 UTC 2014


On Mon, 31 Mar 2014 23:18:13 +0200
Florian Klink <flokli at flokli.de> wrote:

> Am 31.03.2014 21:13, schrieb Dwight Engen:
> > On Mon, 31 Mar 2014 20:34:15 +0200
> > Florian Klink <flokli at flokli.de> wrote:
> > 
> >> Am 31.03.2014 20:10, schrieb Dwight Engen:
> >>> On Sat, 29 Mar 2014 23:39:33 +0100
> >>> Florian Klink <flokli at flokli.de> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> when running multiple lxc actions in row using the command line
> >>>> tools, I sometimes observe the following state:
> >>>>
> >>>>
> >>>> - lxc-monitord is not running anymore
> >>>> - /run/lxc/var/lib/lxc/monitor-fifo still exists, but is
> >>>> "refusing connection"
> >>>>
> >>>> In the logs, I then see the following:
> >>>>
> >>>>
> >>>> lxc-start 1395671045.703 ERROR    lxc_monitor - connect : backing
> >>>> off 10 lxc-start 1395671045.713 ERROR    lxc_monitor - connect :
> >>>> backing off 50 lxc-start 1395671045.763 ERROR    lxc_monitor -
> >>>> connect : backing off 100 lxc-start 1395671045.864 ERROR
> >>>> lxc_monitor - connect : Connection refused
> >>>>
> >>>>
> >>>> ... and the command fails.
> >>>  
> >>> The only time I've seen this happen is if lxc-monitord is hard
> >>> killed so it doesn't have a chance to clean up and remove the
> >>> socket.
> >>
> >> Here, it's happening quite frequently. However, the script never
> >> kills lxc-monitord on its own, it just tries to detect and fix
> >> this state by removing the socket file...
> > 
> > Right, removing the socket file makes it so another lxc-monitord
> > will start, but the question is why is the first one exiting without
> > cleaning up? Can you reliably reproduce it at will? If so then maybe
> > you could attach an strace to lxc-monitord and see why it is
> > exiting.
> 
> I was so far not successful in reproducing the bug while having an
> strace running. :-( But I'll continue to try!
> > 
> >>>
> >>>>
> >>>> A possible workaround would be checking for non-running
> >>>> lxc-monitord process but existing monitor-fifo file then removing
> >>>> the fifo if it exists before running the next lxc command, but
> >>>> thats ugly ;-)
> >>>
> >>> Is there a good non-racy way to do this? I guess monitord could
> >>> write its pid in $LXCPATH and we could kill(pid, 0) it. 
> 
> I also think that lxc should be able to recover from this problem
> automatically.

I agree, though I would like to understand the root cause. Can you try
out the attached patch? I think it will cure your issues.

> >>>  
> >>>> Is this behaviour known? Is there some missing "cleanup code" in
> >>>> lxc(_monitord) or why is it failing like this?
> >>>  
> >>> Currently it catches SIGILL, SIGSEGV, SIGBUS, and SIGTERM and
> >>> cleans up. Other than hard kill I'm not sure what else might
> >>> cause it to exit without cleaning up.
> >>
> >> I shutdown containers with `lxc-stop -n container-name`
> >> (lxc.stopsignal=30 (SIGPWR)), however this signal should never go
> >> to lxc_monitord, right?
> > 
> > Right, that goes to the init process of the container. 
> >  
> >>>
> >>>> Florian
> >>>>
> >>>> _______________________________________________
> >>>> lxc-users mailing list
> >>>> lxc-users at lists.linuxcontainers.org
> >>>> http://lists.linuxcontainers.org/listinfo/lxc-users
> >>>
> >>
> >>
> >> _______________________________________________
> >> lxc-users mailing list
> >> lxc-users at lists.linuxcontainers.org
> >> http://lists.linuxcontainers.org/listinfo/lxc-users
> > 
> 
> 
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-make-monitor-monitord-more-resilient-to-unexpected-t.patch
Type: text/x-patch
Size: 8722 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140331/91621633/attachment.bin>


More information about the lxc-users mailing list