[lxc-devel] OOM

Wed Aug 29 14:28:55 UTC 2012

Hi,
I've a Centos 6 server with custom kernel 3.3.6 compiled (config
from centos) that contains about 40 lxc.

I have compiled the kernel to resolve (unsuccessfully) an OOM issue. With:
lxc.cgroup.memory.limit_in_bytes = 500M
lxc.cgroup.memory.memsw.limit_in_bytes = 500M
lxc.cgroup.memory.oom_control = 0
when the memory rises above the limit the OOM-Killer sometimes (often)
kill processes outside the container that triggered the limit.

To bypass that issue I have configured the containers like the following:
lxc.utsname = test_oom
lxc.tty = 1
lxc.pts = 1024
lxc.rootfs = /lxc/containers/test_oom
lxc.mount  = /conf/lxc/test_oom/fstab
#networking
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = br0
lxc.network.name = eth0
lxc.network.mtu = 1500
lxc.network.ipv4 = X.X.X.X/27
lxc.network.hwaddr = xx:xx:xx:xx:xx:xx
lxc.network.veth.pair = veth-xxx
#cgroups
lxc.cgroup.devices.deny = a
# /dev/null and zero
lxc.cgroup.devices.allow = c 1:3 rwm
lxc.cgroup.devices.allow = c 1:5 rwm
# consoles
lxc.cgroup.devices.allow = c 5:1 rwm
# tty
lxc.cgroup.devices.allow = c 5:0 rwm
lxc.cgroup.devices.allow = c 4:0 rwm
lxc.cgroup.devices.allow = c 4:1 rwm
# /dev/{,u}random
lxc.cgroup.devices.allow = c 1:9 rwm
lxc.cgroup.devices.allow = c 1:8 rwm
lxc.cgroup.devices.allow = c 136:* rwm
lxc.cgroup.devices.allow = c 5:2 rwm
# rtc
lxc.cgroup.devices.allow = c 254:0 rwm
# cpu
lxc.cgroup.cpuset.cpus = 3
#mem
lxc.cgroup.memory.limit_in_bytes = 500M
lxc.cgroup.memory.memsw.limit_in_bytes = 500M
*lxc.cgroup.memory.oom_control = 1*
#capabilities
lxc.cap.drop = sys_module mac_override mac_admin

and I've created mine OOM-killer, using  eventfd, cgroup.event_control,
cgroup.procs, memory.oom_control, etc..
That program works great, killing right processes. I've also set up
the maximum possible value for this sysctl parameter (
http://www.linuxinsight.com/proc_sys_vm_vfs_cache_pressure.html) to bypass
a SLAB cache problem.

But the issue with OOM continue, for example some minutes ago OOM-Killer
has been executed, writing in syslog:

kernel: Out of memory: Kill process 19981 (httpd) score 12 or
sacrifice child
kernel: Killed process 20859 (httpd) total-vm:1022216kB,
anon-rss:416736kB, file-rss:124kB
kernel: httpd invoked oom-killer: gfp_mask=0x0, order=0,
oom_adj=0, oom_score_adj=0
kernel: httpd cpuset=<container> mems_allowed=0
kernel: Pid: 19987, comm: httpd Not tainted 3.3.6 #4
kernel: Call Trace:
kernel: [<ffffffff8110f07b>] dump_header+0x8b/0x1e0
kernel: [<ffffffff8110eb4f>] ? find_lock_task_mm+0x2f/0x80
kernel: [<ffffffff811f93c5>] ? security_capable_noaudit+0x15/0x20
kernel: [<ffffffff8110f8b5>] oom_kill_process+0x85/0x170
kernel: [<ffffffff8110fa9f>] out_of_memory+0xff/0x210
kernel: [<ffffffff8110fc75>] pagefault_out_of_memory+0xc5/0x110
kernel: [<ffffffff81041cfc>] mm_fault_error+0xbc/0x1b0
kernel: [<ffffffff81506873>] do_page_fault+0x3c3/0x460
kernel: [<ffffffff81171253>] ? sys_newfstat+0x33/0x40
kernel: [<ffffffff81503075>] page_fault+0x25/0x30
...

and killing a process outside the container that has invoked it. But I've
disable OOM-killer completely for containers processes!
I think this is a bug in kernel code.

Each containers has own filesystem in a LV like this:
/dev/mapper/lxcbox--01.mmfg.it--vg-test_oom on
/lxc/containers/test_oom type ext3 (rw)
and is a development server with init, rsyslog, mingetty, apache, ssh
and crontab.

Someone can help me to understand where is the problem with oom?

Thank you!

-- 

Davide Belloni
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20120829/527876fb/attachment.html>