[lxc-users] zombie process blocks stopping of container

Michael H. Warfield mhw at WittsEnd.com
Tue Jun 3 15:52:11 UTC 2014


On Tue, 2014-06-03 at 11:08 -0400, Stéphane Graber wrote:
> On Tue, Jun 03, 2014 at 04:56:03PM +0200, Tamas Papp wrote:
> > 
> > On 06/03/2014 04:50 PM, Stéphane Graber wrote:
> > >lxc-stop will send SIGPWR (or the equivalent signal) to the container,
> > >wait 30s then SIGKILL init. lxc-stop -k will skip the SIGPWR step,
> > >lxc-stop --nokill will skip the SIGKILL step.
> > >
> > >It's pretty odd that init after a kill -9 is still marked running... I'd
> > >have expected it to either go away or get stuck in D state if
> > >something's really wrong...
> > >
> > >Do you see anything relevant in the kernel log?
> > 
> > Nothing. I was in hurry, so I restarted the whole machine, I cannot
> > collect more information.
> > Unfortunately I'm pretty sure it will be back soon, since this was
> > not the first time.
> > What do you suggest, what should I check, when I face it again?

> So my hope would be for the kernel to report the task as hung which
> causes a stacktrace to be dumped in dmesg. If not, then it's going to be
> a bit harder to figure it out...

Having done device drivers in the past, I've see this sort of thing more
often than I care for, though mostly in connection with hardware
devices.

Writing distant memory...

Looks to me as if the init process is in a non-interruptable state, in
spite of ps returning indicating that it's in a suspended,
interruptable, state.  That would explain why the Java process was not
reaped and became a zombie, init was not able to respond to the SIGCHLD.

If you can't SIGKILL it, then it's sitting on something deep in the
kernel that won't let it go.  For me, that was most often a lost
interrupt or spinlock in the kernel.  I'd be curious to know what wait
channel it's sitting on and if anything changes after a subsequent -9.
A -9 should even break a resource deadly embrace, still, any change in
that WCHAN would be interesting.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 978-7061 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140603/ace87155/attachment-0001.sig>


More information about the lxc-users mailing list