[lxc-users] zombie process blocks stopping of container

Tamas Papp tompos at martos.bme.hu
Tue Jun 3 15:34:42 UTC 2014


On 06/03/2014 05:08 PM, Stéphane Graber wrote:
> On Tue, Jun 03, 2014 at 04:56:03PM +0200, Tamas Papp wrote:
>> On 06/03/2014 04:50 PM, Stéphane Graber wrote:
>>> lxc-stop will send SIGPWR (or the equivalent signal) to the container,
>>> wait 30s then SIGKILL init. lxc-stop -k will skip the SIGPWR step,
>>> lxc-stop --nokill will skip the SIGKILL step.
>>>
>>> It's pretty odd that init after a kill -9 is still marked running... I'd
>>> have expected it to either go away or get stuck in D state if
>>> something's really wrong...
>>>
>>> Do you see anything relevant in the kernel log?
>> Nothing. I was in hurry, so I restarted the whole machine, I cannot
>> collect more information.
>> Unfortunately I'm pretty sure it will be back soon, since this was
>> not the first time.
>> What do you suggest, what should I check, when I face it again?
> So my hope would be for the kernel to report the task as hung which
> causes a stacktrace to be dumped in dmesg. If not, then it's going to be
> a bit harder to figure it out...
>

Is this valuable?

[514047.425278] ---[ end trace 3d2c1319330f8514 ]---
[514047.469086] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
[514047.490011] invalid opcode: 0000 [#10] SMP
[514047.510486] Modules linked in: joydev hid_generic usbhid hid 
binfmt_misc veth vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT 
ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc 
gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 
lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core 
lpc_ich hpwdt hpilo ioatdma ipmi_si mac_hid acpi_power_meter lp parport 
zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq raid0 multipath linear igb i2c_algo_bit dca ahci 
ptp raid1 psmouse libahci pps_core hpsa
[514047.750609] CPU: 2 PID: 25087 Comm: java Tainted: PF     D    O 
3.13.0-27-generic #50-Ubuntu
[514047.795328] Hardware name: HP ProLiant SL210t Gen8/, BIOS P83 12/20/2013
[514047.818632] task: ffff88175a5fc7d0 ti: ffff88176180e000 task.ti: 
ffff88176180e000
[514047.865496] RIP: 0010:[<ffffffff811793d1>] [<ffffffff811793d1>] 
handle_mm_fault+0xe61/0xf10
[514047.914325] RSP: 0018:ffff88176180fd98  EFLAGS: 00010246
[514047.939354] RAX: 0000000000000100 RBX: 00000007ff41a730 RCX: 
ffff88176180fb10
[514047.989807] RDX: ffff88175a5fc7d0 RSI: 0000000000000000 RDI: 
8000000cf2a009e6
[514048.040666] RBP: ffff88176180fe20 R08: 0000000000000000 R09: 
00000000000000a9
[514048.092282] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffff881765ea5fd0
[514048.145498] R13: ffff88176ac77080 R14: ffff8802711ee200 R15: 
0000000000000080
[514048.199709] FS:  00007f83947f7700(0000) GS:ffff88103fc40000(0000) 
knlGS:0000000000000000
[514048.254465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[514048.281837] CR2: 00007f6fdd6cec58 CR3: 0000000275364000 CR4: 
00000000001427e0
[514048.335499] Stack:
[514048.361406]  ffff88175a5fc7d0 0000000000000000 00007f839802f000 
ffff8802711ee200
[514048.413528]  0000000000000f54 0000000000000000 0000000000000000 
ffffea001e3c7000
[514048.465624]  800000078f1c0867 ffffea0043985670 ffffea00000000a9 
ffff88176180fe00
[514048.517663] Call Trace:
[514048.542981]  [<ffffffff81725924>] __do_page_fault+0x184/0x560
[514048.568584]  [<ffffffff811112ec>] ? acct_account_cputime+0x1c/0x20
[514048.593936]  [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
[514048.618792]  [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
[514048.643188]  [<ffffffff81725d1a>] do_page_fault+0x1a/0x70
[514048.667032]  [<ffffffff81722188>] page_fault+0x28/0x30
[514048.690535] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 
e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 
ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 f8 39 a6 81 44 89 4d c8 e8 98 e3
[514048.763043] RIP  [<ffffffff811793d1>] handle_mm_fault+0xe61/0xf10
[514048.786554]  RSP <ffff88176180fd98>
[514048.809155] ------------[ cut here ]------------
[514048.809343] ---[ end trace 3d2c1319330f8515 ]---
[514048.856623] kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756!
[514048.879824] invalid opcode: 0000 [#11] SMP
[514048.902099] Modules linked in: joydev hid_generic usbhid hid 
binfmt_misc veth vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT 
ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc 
gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 
lrw gf128mul glue_helper ablk_helper cryptd serio_raw sb_edac edac_core 
lpc_ich hpwdt hpilo ioatdma ipmi_si mac_hid acpi_power_meter lp parport 
zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) 
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq raid0 multipath linear igb i2c_algo_bit dca ahci 
ptp raid1 psmouse libahci pps_core hpsa
[514049.154915] CPU: 16 PID: 25089 Comm: java Tainted: PF     D    O 
3.13.0-27-generic #50-Ubuntu
[514049.200777] Hardware name: HP ProLiant SL210t Gen8/, BIOS P83 12/20/2013
[514049.224243] task: ffff88175a5fafe0 ti: ffff88175a724000 task.ti: 
ffff88175a724000
[514049.271305] RIP: 0010:[<ffffffff811793d1>] [<ffffffff811793d1>] 
handle_mm_fault+0xe61/0xf10
[514049.320379] RSP: 0000:ffff88175a725d98  EFLAGS: 00010246
[514049.345393] RAX: 0000000000000100 RBX: 00000007ff412730 RCX: 
ffff88175a725b10
[514049.396064] RDX: ffff88175a5fafe0 RSI: 0000000000000000 RDI: 
8000000cf2a009e6
[514049.446897] RBP: ffff88175a725e20 R08: 0000000000000000 R09: 
00000000000000a9
[514049.498535] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffff881765ea5fd0
[514049.551984] R13: ffff88176ac77080 R14: ffff8802711ee200 R15: 
0000000000000080
[514049.606194] FS:  00007f83945f5700(0000) GS:ffff88103fd40000(0000) 
knlGS:0000000000000000
[514049.661165] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[514049.688513] CR2: 00000007ff412730 CR3: 0000000275364000 CR4: 
00000000001427e0
[514049.742387] Stack:
[514049.768268]  0000000000000001 ffff88175a725db0 ffffffff8109a780 
ffff88175a725dd0
[514049.820366]  ffffffff810d7ad6 0000000000000001 ffffffff81f1f810 
ffffea0025998400
[514049.872439]  8000000966610867 ffffea0043985670 ffffea00000000a9 
00000001ffffffff
[514049.924437] Call Trace:
[514049.949730]  [<ffffffff8109a780>] ? wake_up_state+0x10/0x20
[514049.975243]  [<ffffffff810d7ad6>] ? wake_futex+0x66/0x90
[514050.000224]  [<ffffffff81725924>] __do_page_fault+0x184/0x560
[514050.024857]  [<ffffffff811112ec>] ? acct_account_cputime+0x1c/0x20
[514050.049169]  [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
[514050.072946]  [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
[514050.096475]  [<ffffffff81725d1a>] do_page_fault+0x1a/0x70
[514050.119657]  [<ffffffff81722188>] page_fault+0x28/0x30
[514050.142278] Code: ff 48 89 d9 4c 89 e2 4c 89 ee 4c 89 f7 44 89 4d c8 
e8 34 c1 ff ff 85 c0 0f 85 94 f5 ff ff 49 8b 3c 24 44 8b 4d c8 e9 68 f3 
ff ff <0f> 0b be 8e 00 00 00 48 c7 c7 f8 39 a6 81 44 89 4d c8 e8 98 e3
[514050.211451] RIP  [<ffffffff811793d1>] handle_mm_fault+0xe61/0xf10
[514050.233676]  RSP <ffff88175a725d98>
[514050.255620] ---[ end trace 3d2c1319330f8516 ]---
[516197.062287] init: lxc-instance (fisheye1) main process (4489) killed 
by KILL signal



However, these messages are older ...

tamas


More information about the lxc-users mailing list