[lxc-users] LXD (lxc 2.0.6 + lxd 2.0.8) - OOM problem

Tue Jan 17 15:06:22 UTC 2017

Follow-up.  Seems to be a bug with the kernel (4.4.0-59).  Heads-up to everyone…

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842 <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842>

On Jan 17, 2017, at 7:15 AM, Ron Kelley <rkelleyrtp at gmail.com> wrote:

Greetings all,

Running Ubuntu 16.04 with 5G RAM, 20G SWAP, and LXD (LXC v.2.0.6 and LXD 2.0.8).  We recently did a system update on our LXD servers and started getting a whole bunch of OOM messages from the containers.  Something like this:

----------------------------------------------------------------------
Jan 17 06:20:54 LXD_Server_01 kernel: [259185.075154] mysqld invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0 Jan 17 06:20:54 LXD_Server_01 kernel: [259185.075158] mysqld cpuset=DB-Server3 mems_allowed=0 Jan 17 06:20:54 LXD_Server_01 kernel: [259185.075166] CPU: 0 PID: 27649 Comm: mysqld Not tainted 4.4.0-59-generic #80-Ubuntu Jan 17 06:20:54 LXD_Server_01 kernel: [259185.075167] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
----------------------------------------------------------------------

The container (www-somesitename-com) is using a custom profile like this:
----------------------------------------------------------------------
name: Dual_Network_MySQL
config:
 limits.cpu: "2"
 limits.memory: 512MB
 limits.memory.swap: "true"
 raw.lxc: lxc.cgroup.memory.memsw.limit_in_bytes = 1300M
description: ""
devices:
 eth0:
   name: eth0
   nictype: macvlan
   parent: eth1.2005
   type: nic
 eth1:
   name: eth1
   nictype: macvlan
   parent: eth1.2006
   type: nic
----------------------------------------------------------------------

The above profile should give the container 1.8GB of RAM (512RAM + 1.3G SWAP).  If I look at the container stats, I don’t see where RAM+SWAP were exceeded:
----------------------------------------------------------------------
Name: DB-Server3
Remote: unix:/var/lib/lxd/unix.socket
Architecture: x86_64
Created: 2016/10/17 06:47 UTC
Status: Running
Type: persistent
Profiles: Dual_Network_MySQL
Pid: 2215
Ips:
 eth0:	inet	1.2.3.4
 eth0:	inet6	XXXXX
 eth1:	inet	1.2.3.4
 eth1:	inet6	YYYY
 lo:	inet	127.0.0.1
 lo:	inet6	::1
Resources:
 Processes: 19
 Memory usage:
   Memory (current): 112.85MB
   Memory (peak): 271.26MB
   Swap (current): 12.23MB
   Swap (peak): 5.39MB
 Network usage:
   eth0:
     Bytes received: 4.17GB
     Bytes sent: 69.48GB
     Packets received: 25587831
     Packets sent: 31668639
   eth1:
     Bytes received: 1.53GB
     Bytes sent: 36.13GB
     Packets received: 9743914
     Packets sent: 14022159
   lo:
     Bytes received: 0 bytes
     Bytes sent: 0 bytes
     Packets received: 0
     Packets sent: 0
----------------------------------------------------------------------

This happens on a variety of LXD servers (we have 5 running right now) and a variety of containers.  Running “free -m” on the container server shows plenty of RAM and SWAP available.  The only thing common is the OS running in the container (Ubuntu 16.04).  It seems our CentOS7 containers don’t have this issue.

Any clues/pointers?

Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20170117/c3ab930a/attachment.html>