[lxc-users] Kernel lockups when running lxc-start

Serge Hallyn serge.hallyn at ubuntu.com
Wed Mar 12 15:08:12 UTC 2014


Quoting Dao Quang Minh (dqminh89 at gmail.com):
> We built the kernel based on ubuntu tree with some patches backported from
> 3.14 ( https://github.com/nitrous-io/linux/commits/stable-trusty )
> I can try to run another stress test to see if we can replicate the bug
> with the debug kernel.
> 
> Hmm, moving to overlayfs wasnt actually considered ( we actually moved from
> overlayfs to aufs because we had some weird bugs with overlayfs, but i
> forgot what they were ). However, we can try to remove the bind mounts and

rsync failures due to the inode-copy-up design flaw?  Would be my guess.

> see if that helps.
> 
> Daniel.
> 
> 
> On Wed, Mar 12, 2014 at 10:46 PM, Serge Hallyn <serge.hallyn at ubuntu.com>wrote:
> 
> > Quoting Dao Quang Minh (dqminh89 at gmail.com):
> > > Hi all,
> > >
> > > We encounter a bug today when one of our systems enter soft-lockup when
> > we try to start a container. Unfortunately at that point, we have to do a
> > power cycle because we can’t access the system anymore. Here is the
> > kernel.log:
> > >
> > > [14164995.081770] BUG: soft lockup - CPU#3 stuck for 22s!
> > [lxc-start:20066]
> > > [14164995.081784] Modules linked in: overlayfs(F) veth(F) xt_CHECKSUM(F)
> > quota_v2(F) quota_tree(F) bridge(F) stp(F) llc(F) ipt_MASQUERADE(F)
> > xt_nat(F) xt_tcpudp(F) iptable_nat(F) nf_conntrack_ipv4(F)
> > nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) xt_LOG(F)
> > iptable_filter(F) iptable_mangle(F) ip_tables(F) x_tables(F) intel_rapl(F)
> > crct10dif_pclmul(F) crc32_pclmul(F) ghash_clmulni_intel(F) aesni_intel(F)
> > ablk_helper(F) cryptd(F) lrw(F) gf128mul(F) glue_helper(F) aes_x86_64(F)
> > microcode(F) isofs(F) xfs(F) libcrc32c(F) raid10(F) raid456(F) async_pq(F)
> > async_xor(F) xor(F) async_memcpy(F) async_raid6_recov(F) raid6_pq(F)
> > async_tx(F) raid1(F) raid0(F) multipath(F) linear(F)
> > > [14164995.081820] CPU: 3 PID: 20066 Comm: lxc-start Tainted: GF   B
> >    3.13.4 #1
> > > [14164995.081823] task: ffff880107da9810 ti: ffff8800f494e000 task.ti:
> > ffff8800f494e000
> > > [14164995.081825] RIP: e030:[<ffffffff811e266b>]  [<ffffffff811e266b>]
> > __lookup_mnt+0x5b/0x80
> > > [14164995.081835] RSP: e02b:ffff8800f494fcd8  EFLAGS: 00000296
> > > [14164995.081837] RAX: ffffffff81c6b7e0 RBX: 00000000011e7ab2 RCX:
> > ffff8810a36890b0
> > > [14164995.081838] RDX: 0000000000000997 RSI: ffff881005054f00 RDI:
> > ffff881017f2fba0
> > > [14164995.081840] RBP: ffff8800f494fce8 R08: 0035313638363436 R09:
> > ffff881005054f00
> > > [14164995.081841] R10: 0001010000000000 R11: ffffc90000000000 R12:
> > ffff8810a29a3000
> > > [14164995.081842] R13: ffff8800f494ff28 R14: ffff8800f494fdb8 R15:
> > 0000000000000000
> > > [14164995.081848] FS:  00007fabd0fec800(0000) GS:ffff88110e4c0000(0000)
> > knlGS:0000000000000000
> > > [14164995.081850] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > [14164995.081851] CR2: 0000000001dce000 CR3: 00000000f515f000 CR4:
> > 0000000000002660
> > > [14164995.081853] Stack:
> > > [14164995.081854]  ffff8800f494fd18 ffffffff81c6b7e0 ffff8800f494fd18
> > ffffffff811e2740
> > > [14164995.081857]  ffff8800f494fdb8 ffff8800f494ff28 ffff8810a29a3000
> > ffff8800f494ff28
> > > [14164995.081860]  ffff8800f494fd38 ffffffff811cd17e ffff8800f494fda8
> > ffff8810a29a3000
> > > [14164995.081862] Call Trace:
> > > [14164995.081868]  [<ffffffff811e2740>] lookup_mnt+0x30/0x70
> > > [14164995.081872]  [<ffffffff811cd17e>] follow_mount+0x5e/0x70
> > > [14164995.081875]  [<ffffffff811cffd2>] mountpoint_last+0xc2/0x1e0
> > > [14164995.081877]  [<ffffffff811d01c7>] path_mountpoint+0xd7/0x450
> > > [14164995.081883]  [<ffffffff817639e3>] ?
> > _raw_spin_unlock_irqrestore+0x23/0x50
> > > [14164995.081888]  [<ffffffff811a80a3>] ? kmem_cache_alloc+0x1d3/0x1f0
> > > [14164995.081891]  [<ffffffff811d225a>] ? getname_flags+0x5a/0x190
> > > [14164995.081893]  [<ffffffff811d225a>] ? getname_flags+0x5a/0x190
> > > [14164995.081896]  [<ffffffff811d0574>] filename_mountpoint+0x34/0xc0
> > > [14164995.081899]  [<ffffffff811d2f9a>] user_path_mountpoint_at+0x4a/0x70
> > > [14164995.081902]  [<ffffffff811e317f>] SyS_umount+0x7f/0x3b0
> > > [14164995.081907]  [<ffffffff8102253d>] ? syscall_trace_leave+0xdd/0x150
> > > [14164995.081912]  [<ffffffff8176c87f>] tracesys+0xe1/0xe6
> > > [14164995.081913] Code: 03 0d a2 56 b3 00 48 8b 01 48 89 45 f8 48 8b 55
> > f8 31 c0 48 39 ca 74 2b 48 89 d0 eb 13 0f 1f 00 48 8b 00 48 89 45 f8 48 8b
> > 45 f8 <48> 39 c8 74 18 48 8b 50 10 48 83 c2 20 48 39 d7 75 e3 48 39 70
> > >
> > > After this point, it seems that all lxc-start will fail,but the system
> > continues to run until we power-cycled it.
> > >
> > > When i inspected some of the containers that were started during that
> > time, i saw that one of them has an existing lxc_putold directory ( which
> > should be removed when the container finished starting up right ? ).
> > However, i'm not sure if that is related to the lockup above.
> > >
> > > The host is running on a 12.04 ec2 server, with lxc 1.0.0 and kernel
> > 3.13.0-12.32
> >
> > Hi,
> >
> > where did you get yoru kernel?  Is there an updated version you can
> > fetch or build?
> >
> > You might want to grab or build the debug symbols and see if you
> > can track down what's actually happening in the kernel.  The stack
> > trace doesn't really make sense to me - I see where getname_flags
> > calls __getname() which is kmem_cache_alloc(), but I don't see
> > how that gets us to path_mountpoint().  interrupt?
> >
> > Anyway, my *guess* is this is a bug in aufs, which unfortunately is
> > not upstream.  If you could try replacing aufs with overlayfs, and
> > see if that causes the same problem, that would be a helpful datapoint.
> >
> > -serge
> > _______________________________________________
> > lxc-users mailing list
> > lxc-users at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-users

> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users



More information about the lxc-users mailing list