[Lxc-users] Host crash after oom killer in a container...

Gordon Henderson gordon at drogon.net
Tue Dec 6 18:08:16 UTC 2011


I guess no-one has ever experienced this, or if they do can't work out a 
solution either, however it's just happened again to one of my servers. 
100's of processes stuck in a D state, load average up in the 400 range, 
oom killer active doing it's thing, but then lots and lots of processes 
get stuck in a D state which surely isn't right... and it's also not the 
behaviour of a native system with no containers either.

Bother.

I have removed the memsw.limit_in_bytes parameter and will let it swap the 
next time this happens, but I'd really like to know if there are fixes in 
kernels later than what I'm using.

Anyone?

Gordon


On Thu, 24 Nov 2011, Gordon Henderson wrote:

>
> I've noticed a few oddities recently which has resulted in me needing to
> reboot (and in once case power cycle) a server which isn't good...
>
> I've recently start to set the memoy linits - e.g.
>
> lxc.cgroup.memory.limit_in_bytes       = 1024M
> lxc.cgroup.memory.memsw.limit_in_bytes = 1024M
>
> That, as I understand it will limit a container to 1024M of RAM and 1024M
> of RAM+SWAP - ie. it should prevent using swap at all...
>
> The issues come when a container exceeds that and the oom killer comes in
> - which seems to do what it's supposed to do, but after that some real
> oddities start to happen. The process table expands and lots and lots of
> processes sit in a D state waiting on something. Load average gets to over
> 300. What's worse is that doing some operations on the host server seem to
> stall to - e.g. "ps ax" - it listed a few 100 processes then stalled!
>
> LXC is 0.7.2 (Debian Stable/Squeeze)
> Kernel is 2.6.35.13
>
> I've noticed it in 2 different servers now - one a dual-core Intel, the
> other a quad core AMD - both running Debian Squeeze and the same 2.6.35.13
> kernel custom compiled for the underlying hardware.
>
> I have the kernel log-files avalable if anyone wants them, but I'm really
> intersted to know if I'm missing anything obvious - wrong paramters, or
> just expecting too much - known issues here - should I use a different
> kernel and so on...
>
> Thanks,
>
> Gordon
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Lxc-users mailing list
> Lxc-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-users
>




More information about the lxc-users mailing list