[Lxc-users] kernel bug?

Thu Mar 14 03:31:34 UTC 2013

Quoting Gary Ballantyne (gary.ballantyne at haulashore.com):
> Hi All
> 
> I have an intermittent, but crippling, problem on a raring EC2 instance 
> (also on quantal). Its a (raring) lvm-backed container --- I use cgroups 
> directly (via /sys/fs) and iptables in the instance (not sure if that's 
> relevant at all).
> 
> Occasionally, when stopping or starting the container (there is just 
> one), the instance becomes unreachable. Rebooting doesn't help, but 
> starting/stopping the instance, typically at least twice, fixes things 
> (the instance is reachable, and the container auto-starts).
> 
> There doesn't appear to be anything sinister in /var/log/dmesg (upon 
> restart), but the AWS system log is pasted below. I *think* the first 
> part corresponds to before the crash, and the interesting bit is:
> 
> [3587596.471053] ------------[ cut here ]------------
> [3587596.471071] Kernel BUG at ffffffff816c7c2c [verbose debug info
> ...
> [3587596.472282] ---[ end trace dc5c4320e1320f1d ]---
> [3587596.472292] Fixing recursive fault but reboot is needed!

Looks to me like the problem is a conflict between memory cgroup and
xen:

[3587596.472052]  [<ffffffff8100508c>] ? xen_mc_extend_args+0xfc/0x120
[3587596.472061]  [<ffffffff816c827b>] do_page_fault+0x2b/0x50
[3587596.472068]  [<ffffffff816c4818>] page_fault+0x28/0x30
[3587596.472076]  [<ffffffff81187c24>] ?  mem_cgroup_charge_statistics.isra.20+0x14/0x50
[3587596.472085]  [<ffffffff81189cd0>] __mem_cgroup_uncharge_common+0xd0/0x2d0
[3587596.472093]  [<ffffffff8118d21a>] mem_cgroup_uncharge_page+0x2a/0x30

If you have your System.map file, or even better yet if you objdump -d
your uncompressed vmlinux, you should be able to figure out more about
what is going on at those locations.

You don't say what distro/kernel version you have, but you also might
google on these functions, or check the git logs for recent changes/
fixes.

-serge