[lxc-users] ?==?utf-8?q? LXD 2.14 - Ubuntu 16.04 - kernel 4.4.0-57-generic - SWAP continuing to grow

Tomasz Chmielewski mangoo at wpkg.org
Fri Jul 28 14:23:02 UTC 2017


Most likely your database cache is simply set too large.

I've been experiencing similar issues with MySQL  (please read in detail):

https://stackoverflow.com/questions/43259136/mysqld-out-of-memory-with-plenty-of-memory/43259820

It finally went away after I've been lowering MySQL cache by a few GBs from each OOM to OOM, until it stopped happenin

Tomasz Chmielewski
https://lxadm.com

On Saturday, July 15, 2017 18:36 JST, Saint Michael <venefax at gmail.com> wrote: 
 
> I have a lot of memory management issues using pure LXC. In my case, my box
> has only one container. I use LXC to be able to move my app around, not to
> squeeze performance out of hardware. What happens is my database gets
> killed the OOM manager, although there are gigabytes of RAM used for cache.
> The memory manager kills applications instead of reclaiming memory from
> disc cache. How can this be avoided?
> 
> My config at the host is:
> 
> vm.hugepages_treat_as_movable=0
> vm.hugetlb_shm_group=27
> vm.nr_hugepages=2500
> vm.nr_hugepages_mempolicy=2500
> vm.nr_overcommit_hugepages=0
> vm.overcommit_memory=0
> vm.swappiness=0
> vm.vfs_cache_pressure=150
> vm.dirty_ratio=10
> vm.dirty_background_ratio=5
> 
> This shows the issue
> [9449866.130270] Node 0 hugepages_total=1250 hugepages_free=1250
> hugepages_surp=0 hugepages_size=2048kB
> [9449866.130271] Node 1 hugepages_total=1250 hugepages_free=1248
> hugepages_surp=0 hugepages_size=2048kB
> [9449866.130271] 46181 total pagecache pages
> [9449866.130273] 33203 pages in swap cache
> [9449866.130274] Swap cache stats: add 248571542, delete 248538339, find
> 69031185/100062903
> [9449866.130274] Free swap  = 0kB
> [9449866.130275] Total swap = 8305660kB
> [9449866.130276] 20971279 pages RAM
> [9449866.130276] 0 pages HighMem/MovableOnly
> [9449866.130276] 348570 pages reserved
> [9449866.130277] 0 pages cma reserved
> [9449866.130277] 0 pages hwpoisoned
> [9449866.130278] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds
> swapents oom_score_adj name
> [9449866.130286] [  618]     0   618    87181      135     168       3
>    3             0 systemd-journal
> [9449866.130288] [  825]     0   825    11343      130      25       3
>    0             0 systemd-logind
> [9449866.130289] [  830]     0   830     1642       31       8       3
>    0             0 mcelog
> [9449866.130290] [  832]   996   832    26859       51      23       3
>   47             0 chronyd
> [9449866.130292] [  834]     0   834     4905      100      12       3
>    0             0 irqbalance
> [9449866.130293] [  835]     0   835     6289      177      15       3
>    0             0 smartd
> [9449866.130295] [  837]    81   837    28499      258      28       3
>  149          -900 dbus-daemon
> [9449866.130296] [  857]     0   857     1104       16       7       3
>    0             0 rngd
> [9449866.130298] [  859]     0   859   192463    37114     224       4
>  40630             0 NetworkManager
> [9449866.130300] [  916]     0   916    25113      229      50       3
>    0         -1000 sshd
> [9449866.130302] [  924]     0   924     6490       50      17       3
>    0             0 atd
> [9449866.130303] [  929]     0   929    35327      199      20       3
>  284             0 agetty
> [9449866.130305] [  955]     0   955    22199     3185      43       3
>  312             0 dhclient
> [9449866.130307] [ 1167]     0  1167     6125       88      17       3
>    2             0 lxc-autostart
> [9449866.130309] [ 1176]     0  1176    10818      275      24       3
>   38             0 systemd
> [9449866.130310] [ 1188]     0  1188    13303     1980      29       3
>   36             0 systemd-journal
> [9449866.130312] [ 1372]    99  1372     3881        2      12       3
>   45             0 dnsmasq
> [9449866.130313] [ 1375]    81  1375     6108       77      17       3
>   39          -900 dbus-daemon
> [9449866.130315] [ 1394]     0  1394     6175       46      15       3
>  168             0 systemd-logind
> [9449866.130316] [ 1395]     0  1395    78542     1142      69       3
>    4             0 rsyslogd
> [9449866.130317] [ 1397]     0  1397     1614       32       8       3
>    0             0 agetty
> [9449866.130319] [ 1398]     0  1398     1614       31       8       3
>    0             0 agetty
> [9449866.130320] [ 1400]     0  1400     1614       31       8       3
>    0             0 agetty
> [9449866.130321] [ 1401]     0  1401     1614        2       8       3
>   30             0 agetty
> [9449866.130322] [ 1402]     0  1402     1614        2       8       3
>   29             0 agetty
> [9449866.130324] [ 1403]     0  1403     1614       31       8       3
>    0             0 agetty
> [9449866.130325] [ 1404]     0  1404     1614       32       8       3
>    0             0 agetty
> [9449866.130327] [ 1405]     0  1405     1614       32       8       3
>    0             0 agetty
> [9449866.130328] [ 1406]     0  1406     1614        2       8       3
>   29             0 agetty
> [9449866.130329] [ 1408]     0  1408     1614        2       8       3
>   30             0 agetty
> [9449866.130330] [ 1409]     0  1409     1614       30       7       3
>    0             0 agetty
> [9449866.130332] [18224]     0 18224    26456        0      43       4
>  404             0 VGAuthService
> [9449866.130333] [18225]     0 18225    61032       95      58       3
>  258             0 vmtoolsd
> [9449866.130335] [28660]     0 28660    26372       44      54       4
>  202         -1000 sshd
> [9449866.130337] [18992]   998 18992   132859      876      54       3
>   13             0 polkitd
> [9449866.130339] [23849]     0 23849    10744      370      23       3
>    0         -1000 systemd-udevd
> [9449866.130340] [ 3484]     0  3484   184082      265     243       4
>  129             0 rsyslogd
> [9449866.130342] [31175]    32 31175    14328       35      30       3
>  102             0 rpcbind
> [9449866.130344] [31205]     0 31205   111747        0      65       3
>  343             0 abrtd
> [9449866.130345] [31248]     0 31248   187819       30     167       4
>  291             0 abrt-dump-journ
> [9449866.130347] [31303]     0 31303   172125       32     120       4
>  258             0 abrt-dump-journ
> [9449866.130348] [16252]     0 16252     5659      129      17       3
>   24             0 crond
> [9449866.130350] [11626]     0 11626    33235       25      15       3
>  130             0 crond
> [9449866.130351] [11717]     0 11717    13897      109      26       3
>    3         -1000 auditd
> [9449866.130353] [31764]     0 31764    51162       73      36       3
>   50             0 gssproxy
> [9449866.130355] [31372]     0 31372    11441      557      21       3
>    0         -1000 systemd-udevd
> [9449866.130357] [12835]    27 12835 23997106 18960777   43207     119
>  2018937             0 mysqld
> [9449866.130359] [ 8109]     0  8109    17597      220      39       3
>    3             0 crond
> [9449866.130361] [ 8185]     0  8185    28282       57      10       3
>    0             0 safe_asterisk1
> [9449866.130363] [ 8186]     0  8186   901036     8104     344       7
>    3             0 asterisk
> [9449866.130364] [ 8215]     0  8215    28282       57      10       4
>    0             0 safe_asterisk2
> [9449866.130366] [ 8216]     0  8216   917544     8133     345       6
>   19             0 asterisk
> [9449866.130367] [ 8265]     0  8265    28282       57      10       3
>    0             0 safe_asterisk3
> [9449866.130369] [ 8266]     0  8266   934048     8203     347       7
>    1             0 asterisk
> [9449866.130370] [ 8351]     0  8351    28282       57      10       3
>    0             0 safe_asterisk4
> [9449866.130372] [ 8353]     0  8353   950557     8235     349       6
>   15             0 asterisk
> [9449866.130373] [ 8400]     0  8400    28282       57      10       3
>    0             0 safe_asterisk5
> [9449866.130375] [ 8401]     0  8401   901033     8122     345       7
>    1             0 asterisk
> [9449866.130376] [ 8460]     0  8460    28282       57       9       3
>    0             0 safe_asterisk6
> [9449866.130377] [ 8461]     0  8461  1148653     8136     361       8
>   47             0 asterisk
> [9449866.130379] [ 8537]     0  8537    28282       57       9       3
>    0             0 safe_asterisk7
> [9449866.130403] [ 8538]     0  8538   983574     8148     349       6
>    2             0 asterisk
> [9449866.130405] [ 8594]     0  8594    28282       58      10       3
>    0             0 safe_asterisk8
> [9449866.130406] [ 8596]     0  8596   950555     8131     350       7
>   61             0 asterisk
> [9449866.130408] [ 8649]     0  8649    28282       57       9       4
>    0             0 safe_asterisk9
> [9449866.130409] [ 8651]     0  8651   901033     8139     345       6
>    0             0 asterisk
> [9449866.130411] [ 8712]     0  8712    28282       58      10       3
>    0             0 safe_asterisk10
> [9449866.130413] [ 8714]     0  8714   901034     8114     342       6
>   14             0 asterisk
> [9449866.130415] [14800]     0 14800    31930      117      18       3
>    0             0 screen
> [9449866.130416] [14801]     0 14801    28284       55      12       3
>    0             0 audit.sh
> [9449866.130419] [17583]     0 17583    28284       55      10       3
>    0             0 audit.sh
> [9449866.130421] [17584]     0 17584    40172      279      35       3
>    0             0 mysql
> [9449866.130422] [17585]     0 17585    28373       22      12       3
>    0             0 awk
> [9449866.130424] Out of memory: Kill process 12835 (mysqld) score 926 or
> sacrifice child
> [9449866.130661] Killed process 12835 (mysqld) total-vm:95988424kB,
> anon-rss:75843108kB, file-rss:0kB, shmem-rss:0kB
> [9449872.448957] oom_reaper: reaped process 12835 (mysqld), now
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> 
> Philip
> 
> 
> 
> 
> On Sat, Jul 15, 2017 at 5:11 AM, Marat Khalili <mkh at rqc.ru> wrote:
> 
> > I'm using LXC, and I frequently observe some unused containers get swapped
> > out, even though system has plenty of RAM and no RAM limits are set. The
> > only bad effect I observe is couple of seconds delay when you first log
> > into them after some time. I guess it is absolutely normal since kernel
> > tries to maximize amount of memory available for disk caches.
> >
> > If you don't like this behavior, instead of trying to fine tune kernel
> > parameters why not disable swap altogether? Many people run it this way,
> > it's mostly a matter of taste these days. (But first check your software
> > for leaks.)
> >
> > > For example, our “server-4” machine shows 8G total RAM, 500MB free, 2.5G
> > available, and 5G of buff/cache. Yet, swap is at 5.5GB and has been slowly
> > growing over the past few days. It seems something is preventing the apps
> > from using the RAM.
> >
> > Did you identify what processes all this virtual memory belongs to?
> >
> > > To be honest, we have been battling lots of memory/swap issues using
> > LXD. We started with no tuning, but the app stack quickly ran out of
> > memory.
> >
> > LXC/LXD is hardly responsible for your app stack memory usage. Either you
> > underestimated it or there's a memory leak somewhere.
> >
> > > Given all the issues we have had with memory and swap using LXD, we are
> > seriously considering moving back to the traditional VM approach until
> > LXC/LXD is better “baked”.
> >
> > Did your VMs use less memory? I don't think so. Limits could be better
> > enforced, but VMs don't magically give you infinite RAM.
> > --
> >
> > With Best Regards,
> > Marat Khalili
> >
> > On July 14, 2017 9:58:57 PM GMT+03:00, Ron Kelley <rkelleyrtp at gmail.com>
> > wrote:
> >>
> >> Wondering if anyone else has similar issues.
> >>
> >> We have 5x LXD 2.12 servers running (U16.04 - kernel 4.4.0-57-generic - 8G RAM, 19G SWAP).  Each server is running about 50 LXD containers - Wordpress w/Nginx and PHP7.  The servers have been running for about 15 days now, and swap space continues to grow.  In addition, the kswapd0 process starts consuming CPU until we flush the system cache via "/bin/echo 3 > /proc/sys/vm/drop_caches” command.
> >>
> >> Our LXD profile looks like this:
> >> -------------------------
> >> config:
> >>   limits.cpu: "2"
> >>   limits.memory: 512MB
> >>   limits.memory.swap: "true"
> >>   limits.memory.swap.priority: "1"
> >> -------------------------
> >>
> >>
> >> We also have added these to /etc/sysctl.conf
> >> -------------------------
> >> vm.swappiness=10
> >> vm.vfs_cache_pressure=50
> >> -------------------------
> >>
> >> A quick “top” output shows plenty of available Memory and buff/cache.  But, for some reason, the system continues to swap out the app.  For example, our “server-4” machine shows 8G total RAM, 500MB free, 2.5G available, and 5G of buff/cache.  Yet, swap is at 5.5GB and has been slowly growing over the past few days.  It seems something is preventing the apps from using the RAM.
> >>
> >>
> >> To be honest, we have been battling lots of memory/swap issues using LXD.  We started with no tuning, but the app stack quickly ran out of memory.  After editing the profile to allow 512MB RAM per container (and restarting the container), the kswapd0 issue happens.  Given all the issues we have had with memory and swap using LXD, we are seriously considering moving back to the traditional VM approach until LXC/LXD is better “baked”.
> >>
> >>
> >> -Ron
> >> ------------------------------
> >>
> >> lxc-users mailing list
> >> lxc-users at lists.linuxcontainers.org
> >> http://lists.linuxcontainers.org/listinfo/lxc-users
> >>
> >>
> > _______________________________________________
> > lxc-users mailing list
> > lxc-users at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-users
> >
 
 
 
-- 
Tomasz Chmielewski


More information about the lxc-users mailing list