[lxc-users] lxc_monitor exiting, but not cleaning monitor-fifo?

Dwight Engen dwight.engen at oracle.com
Mon Mar 31 19:13:44 UTC 2014


On Mon, 31 Mar 2014 20:34:15 +0200
Florian Klink <flokli at flokli.de> wrote:

> Am 31.03.2014 20:10, schrieb Dwight Engen:
> > On Sat, 29 Mar 2014 23:39:33 +0100
> > Florian Klink <flokli at flokli.de> wrote:
> > 
> >> Hi,
> >>
> >> when running multiple lxc actions in row using the command line
> >> tools, I sometimes observe the following state:
> >>
> >>
> >> - lxc-monitord is not running anymore
> >> - /run/lxc/var/lib/lxc/monitor-fifo still exists, but is "refusing
> >> connection"
> >>
> >> In the logs, I then see the following:
> >>
> >>
> >> lxc-start 1395671045.703 ERROR    lxc_monitor - connect : backing
> >> off 10 lxc-start 1395671045.713 ERROR    lxc_monitor - connect :
> >> backing off 50 lxc-start 1395671045.763 ERROR    lxc_monitor -
> >> connect : backing off 100 lxc-start 1395671045.864 ERROR
> >> lxc_monitor - connect : Connection refused
> >>
> >>
> >> ... and the command fails.
> >  
> > The only time I've seen this happen is if lxc-monitord is hard
> > killed so it doesn't have a chance to clean up and remove the
> > socket.
> 
> Here, it's happening quite frequently. However, the script never kills
> lxc-monitord on its own, it just tries to detect and fix this state by
> removing the socket file...

Right, removing the socket file makes it so another lxc-monitord will
start, but the question is why is the first one exiting without
cleaning up? Can you reliably reproduce it at will? If so then maybe
you could attach an strace to lxc-monitord and see why it is exiting.

> > 
> >>
> >> A possible workaround would be checking for non-running
> >> lxc-monitord process but existing monitor-fifo file then removing
> >> the fifo if it exists before running the next lxc command, but
> >> thats ugly ;-)
> > 
> > Is there a good non-racy way to do this? I guess monitord could
> > write its pid in $LXCPATH and we could kill(pid, 0) it. 
> >  
> >> Is this behaviour known? Is there some missing "cleanup code" in
> >> lxc(_monitord) or why is it failing like this?
> >  
> > Currently it catches SIGILL, SIGSEGV, SIGBUS, and SIGTERM and cleans
> > up. Other than hard kill I'm not sure what else might cause it to
> > exit without cleaning up.
> 
> I shutdown containers with `lxc-stop -n container-name`
> (lxc.stopsignal=30 (SIGPWR)), however this signal should never go to
> lxc_monitord, right?

Right, that goes to the init process of the container. 
 
> > 
> >> Florian
> >>
> >> _______________________________________________
> >> lxc-users mailing list
> >> lxc-users at lists.linuxcontainers.org
> >> http://lists.linuxcontainers.org/listinfo/lxc-users
> > 
> 
> 
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users



More information about the lxc-users mailing list