[lxc-users] zombie process blocks stopping of container

Michael H. Warfield mhw at WittsEnd.com
Tue Jun 3 15:57:06 UTC 2014


On Tue, 2014-06-03 at 15:35 +0000, Serge Hallyn wrote:
> Quoting Stéphane Graber (stgraber at ubuntu.com):
> > On Tue, Jun 03, 2014 at 04:56:03PM +0200, Tamas Papp wrote:
> > > 
> > > On 06/03/2014 04:50 PM, Stéphane Graber wrote:
> > > >lxc-stop will send SIGPWR (or the equivalent signal) to the container,
> > > >wait 30s then SIGKILL init. lxc-stop -k will skip the SIGPWR step,
> > > >lxc-stop --nokill will skip the SIGKILL step.
> > > >
> > > >It's pretty odd that init after a kill -9 is still marked running... I'd
> > > >have expected it to either go away or get stuck in D state if
> > > >something's really wrong...
> > > >
> > > >Do you see anything relevant in the kernel log?
> > > 
> > > Nothing. I was in hurry, so I restarted the whole machine, I cannot
> > > collect more information.
> > > Unfortunately I'm pretty sure it will be back soon, since this was
> > > not the first time.
> > > What do you suggest, what should I check, when I face it again?
> > 
> > So my hope would be for the kernel to report the task as hung which
> > causes a stacktrace to be dumped in dmesg. If not, then it's going to be
> > a bit harder to figure it out...

> Both the container init and its lxc monitor where in S state.  I assume
> a sigcont sent to one or both of those would have fixed it.

I interpreted that "S" to indicate it was "suspended interruptable"
state.  Maybe we're using a different ps but this is what my man page
says:

-- 
PROCESS STATE CODES
       Here are the different values that the s, stat and state output
       specifiers (header "STAT" or "S") will display to describe the state of
       a process:

               D    uninterruptible sleep (usually IO)
               R    running or runnable (on run queue)
               S    interruptible sleep (waiting for an event to complete)
               T    stopped by job control signal
               t    stopped by debugger during the tracing
               W    paging (not valid since the 2.6.xx kernel)
               X    dead (should never be seen)
               Z    defunct ("zombie") process, terminated but not reaped by
                    its parent
-- 

State T is "stopped by job control signal"

In this case, though, it seems like the kernel is treating this init
process as if it were in state D, uninterruptible sleep, which even a
SIGKILL will not interrupt.

> Until that
> happens, any other tasks in the container will indeed be zombies waiting
> for the container init to reap them.
> 
> The question is why container init and monitor were stopped.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140603/627bde8a/attachment.sig>


More information about the lxc-users mailing list