[lxc-users] zombie process blocks stopping of container
Tamas Papp
tompos at martos.bme.hu
Tue Jun 3 15:34:42 UTC 2014
On 06/03/2014 05:08 PM, Stéphane Graber wrote:
> On Tue, Jun 03, 2014 at 04:56:03PM +0200, Tamas Papp wrote:
>> On 06/03/2014 04:50 PM, Stéphane Graber wrote:
>>> lxc-stop will send SIGPWR (or the equivalent signal) to the container,
>>> wait 30s then SIGKILL init. lxc-stop -k will skip the SIGPWR step,
>>> lxc-stop --nokill will skip the SIGKILL step.
>>>
>>> It's pretty odd that init after a kill -9 is still marked running... I'd
>>> have expected it to either go away or get stuck in D state if
>>> something's really wrong...
>>>
>>> Do you see anything relevant in the kernel log?
>> Nothing. I was in hurry, so I restarted the whole machine, I cannot
>> collect more information.
>> Unfortunately I'm pretty sure it will be back soon, since this was
>> not the first time.
>> What do you suggest, what should I check, when I face it again?
> So my hope would be for the kernel to report the task as hung which
> causes a stacktrace to be dumped in dmesg. If not, then it's going to be
> a bit harder to figure it out...
>
Is this valuable?
[514047.425278] ---[ end trace 3d2c1319330f8514 ]---
[514047.469086] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
[514047.490011] invalid opcode: 0000 [#10] SMP
[514047.510486] Modules linked in: joydev hid_generic usbhid hid
binfmt_misc veth vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT
ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc
gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core
lpc_ich hpwdt hpilo ioatdma ipmi_si mac_hid acpi_power_meter lp parport
zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF)
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq raid0 multipath linear igb i2c_algo_bit dca ahci
ptp raid1 psmouse libahci pps_core hpsa
[514047.750609] CPU: 2 PID: 25087 Comm: java Tainted: PF D O
3.13.0-27-generic #50-Ubuntu
[514047.795328] Hardware name: HP ProLiant SL210t Gen8/, BIOS P83 12/20/2013
[514047.818632] task: ffff88175a5fc7d0 ti: ffff88176180e000 task.ti:
ffff88176180e000
[514047.865496] RIP: 0010:[<ffffffff811793d1>] [<ffffffff811793d1>]
handle_mm_fault+0xe61/0xf10
[514047.914325] RSP: 0018:ffff88176180fd98 EFLAGS: 00010246
[514047.939354] RAX: 0000000000000100 RBX: 00000007ff41a730 RCX:
ffff88176180fb10
[514047.989807] RDX: ffff88175a5fc7d0 RSI: 0000000000000000 RDI:
8000000cf2a009e6
[514048.040666] RBP: ffff88176180fe20 R08: 0000000000000000 R09:
00000000000000a9
[514048.092282] R10: 0000000000000001 R11: 0000000000000000 R12:
ffff881765ea5fd0
[514048.145498] R13: ffff88176ac77080 R14: ffff8802711ee200 R15:
0000000000000080
[514048.199709] FS: 00007f83947f7700(0000) GS:ffff88103fc40000(0000)
knlGS:0000000000000000
[514048.254465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[514048.281837] CR2: 00007f6fdd6cec58 CR3: 0000000275364000 CR4:
00000000001427e0
[514048.335499] Stack:
[514048.361406] ffff88175a5fc7d0 0000000000000000 00007f839802f000
ffff8802711ee200
[514048.413528] 0000000000000f54 0000000000000000 0000000000000000
ffffea001e3c7000
[514048.465624] 800000078f1c0867 ffffea0043985670 ffffea00000000a9
ffff88176180fe00
[514048.517663] Call Trace:
[514048.542981] [<ffffffff81725924>] __do_page_fault+0x184/0x560
[514048.568584] [<ffffffff811112ec>] ? acct_account_cputime+0x1c/0x20
[514048.593936] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
[514048.618792] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
[514048.643188] [<ffffffff81725d1a>] do_page_fault+0x1a/0x70
[514048.667032] [<ffffffff81722188>] page_fault+0x28/0x30
[514048.690535] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8
e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3
ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 f8 39 a6 81 44 89 4d c8 e8 98 e3
[514048.763043] RIP [<ffffffff811793d1>] handle_mm_fault+0xe61/0xf10
[514048.786554] RSP <ffff88176180fd98>
[514048.809155] ------------[ cut here ]------------
[514048.809343] ---[ end trace 3d2c1319330f8515 ]---
[514048.856623] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
[514048.879824] invalid opcode: 0000 [#11] SMP
[514048.902099] Modules linked in: joydev hid_generic usbhid hid
binfmt_misc veth vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT
ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc
gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core
lpc_ich hpwdt hpilo ioatdma ipmi_si mac_hid acpi_power_meter lp parport
zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF)
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor raid6_pq raid0 multipath linear igb i2c_algo_bit dca ahci
ptp raid1 psmouse libahci pps_core hpsa
[514049.154915] CPU: 16 PID: 25089 Comm: java Tainted: PF D O
3.13.0-27-generic #50-Ubuntu
[514049.200777] Hardware name: HP ProLiant SL210t Gen8/, BIOS P83 12/20/2013
[514049.224243] task: ffff88175a5fafe0 ti: ffff88175a724000 task.ti:
ffff88175a724000
[514049.271305] RIP: 0010:[<ffffffff811793d1>] [<ffffffff811793d1>]
handle_mm_fault+0xe61/0xf10
[514049.320379] RSP: 0000:ffff88175a725d98 EFLAGS: 00010246
[514049.345393] RAX: 0000000000000100 RBX: 00000007ff412730 RCX:
ffff88175a725b10
[514049.396064] RDX: ffff88175a5fafe0 RSI: 0000000000000000 RDI:
8000000cf2a009e6
[514049.446897] RBP: ffff88175a725e20 R08: 0000000000000000 R09:
00000000000000a9
[514049.498535] R10: 0000000000000001 R11: 0000000000000000 R12:
ffff881765ea5fd0
[514049.551984] R13: ffff88176ac77080 R14: ffff8802711ee200 R15:
0000000000000080
[514049.606194] FS: 00007f83945f5700(0000) GS:ffff88103fd40000(0000)
knlGS:0000000000000000
[514049.661165] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[514049.688513] CR2: 00000007ff412730 CR3: 0000000275364000 CR4:
00000000001427e0
[514049.742387] Stack:
[514049.768268] 0000000000000001 ffff88175a725db0 ffffffff8109a780
ffff88175a725dd0
[514049.820366] ffffffff810d7ad6 0000000000000001 ffffffff81f1f810
ffffea0025998400
[514049.872439] 8000000966610867 ffffea0043985670 ffffea00000000a9
00000001ffffffff
[514049.924437] Call Trace:
[514049.949730] [<ffffffff8109a780>] ? wake_up_state+0x10/0x20
[514049.975243] [<ffffffff810d7ad6>] ? wake_futex+0x66/0x90
[514050.000224] [<ffffffff81725924>] __do_page_fault+0x184/0x560
[514050.024857] [<ffffffff811112ec>] ? acct_account_cputime+0x1c/0x20
[514050.049169] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
[514050.072946] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
[514050.096475] [<ffffffff81725d1a>] do_page_fault+0x1a/0x70
[514050.119657] [<ffffffff81722188>] page_fault+0x28/0x30
[514050.142278] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8
e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3
ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 f8 39 a6 81 44 89 4d c8 e8 98 e3
[514050.211451] RIP [<ffffffff811793d1>] handle_mm_fault+0xe61/0xf10
[514050.233676] RSP <ffff88175a725d98>
[514050.255620] ---[ end trace 3d2c1319330f8516 ]---
[516197.062287] init: lxc-instance (fisheye1) main process (4489) killed
by KILL signal
However, these messages are older ...
tamas
More information about the lxc-users
mailing list