[lxc-users] sysctl.conf and security/limits.conf tuning for running containers
Andrey Repin
anrdaemon at yandex.ru
Sat Sep 14 07:01:54 UTC 2019
Greetings, Adrian Pepper!
> I'll start this lengthy message with a table-of-contents of sorts.
Next time please post a new message, when you open a new thread to the list.
> === Only a limited number of containers could run usefully ===
> I had had problems on my workstation running more than about 10
> containers; subsequent ones would show as RUNNING, but have no IP
> address. lxc-attach suggested /sbin/init was actually hung, with
> no apparent way to recover them. I used to resort to shutting down
> lesser-needed containers to allow new ones to run usefully.
>
> Then one day, I decided to try and pursue the problem a little harder.
>
> === github lxc/lxd production-setup.md ===
> Eventually, mostly by checking my mbox archive of this list
> (lxc-users at lists.linuxcontainers.org), I stumbled on...
> https://github.com/lxc/lxd/blob/master/doc/production-setup.md
> It's not clear to me what the context of that document really is.
> Does it end up in the contents of lxd? (I still use lxc).
> But even referenced directly from the git repository, it still
> provides useful information.
> I summarized that production-setup.md for myself...
> /etc/security/limits.conf
> #<domain> <type> <item> <value>
> * soft nofile 1048576 unset
> * hard nofile 1048576 unset
> root soft nofile 1048576 unset
> root hard nofile 1048576 unset
> * soft memlock unlimited unset
> * hard memlock unlimited unset
> /etc/sysctl.conf (effective)
> fs.inotify.max_queued_events 1048576 # def:16384
> fs.inotify.max_user_instances 1048576 # def:128
> fs.inotify.max_user_watches 1048576 # def:8192
> vm.max_map_count 262144 # def:65530 max mma per proc
> kernel.dmesg_restrict 1 # def:0
> net.ipv4.neigh.default.gc_thresh3 8192 # def:1024 arp table limit
> net.ipv6.neigh.default.gc_thresh3 8192 # def:1024 arp table limit
> kernel.keys.maxkeys 2000 # def:200 non-root key limit
> # s.b. > number of containers
> net.core.netdev_max_backlog "increase" suggests 182757 (from 1000!)
> During this time of my most recent investigation, I had happened to
> suspect fs.inotify.max_user_watches (Because a "tail" I ran indicated
> that it could not use "inotify" and needed to poll instead).
> (Hey, there I sound like a natural kernel geek, but actually I needed
> to do a few web searches to correlate the tail diagnostic to the setting).
> production-setup.md also has suggestions about txqueuelen, but I will
> assume for now those apply only to systems wanting to generate or
> receive a lot of real network traffic.
> === Recommended values seem arbitrary, perhaps excessive in some cases ===
> In the suggestions above:
> 1048576 is 1024*1024 and seems very arbitrary.
> Hopefully, this is mostly increasing the size of edge-pointer tables
> and so doesn't consume a lot of memory unless the resources do get
> close to the maximum. I actually used smaller values. A little more
> in line with the proportions of the defaults (shown above).
> cscf-adm at scspc578-1804:~$ grep '^' /proc/sys/fs/inotify/*
> /proc/sys/fs/inotify/max_queued_events:262144
> /proc/sys/fs/inotify/max_user_instances:131072
> /proc/sys/fs/inotify/max_user_watches:262144
> cscf-adm at scspc578-1804:~$
> Searching for more info about netdev_max_backlog found
> https://community.mellanox.com/s/article/linux-sysctl-tuning
> It suggests raising net.core.netdev_max_backlog=250000
> So I went with that.
> I still haven't figured out the significance of 182757, the apparent
> product of two primes, 3 * 60919. Nor can I see any significance to
> any of its near-adjacent numbers.
> After applying changes similar to the above, I observed very good
> results. Whereas before I seemed to run into problems at around 12
> containers, I am currently running 17, and have run more.
It would be useful if you can discover/describe some direct ways to
investigate limits congestion. Will be way more helpful in tuning container
host systems for specific needs.
> === /sbin/init sometimes missing on apparently healthy containers? ===
> Also, I previously observed that the number of /sbin/init processes
> running was significantly fewer than the number of apparently properly
> functional containers. The good news is that today there are almost
> as many /sbin/init processes running as containers. The bad news is
> that N(/sbin/init) == N(containers)-1 whereas I would think it should
> equal N(containers)+1
> (That is, I confirmed by sshing to each container in turn and looking
> in "ps" output for /init that two containers had no /init running, but
> they both seem to be generally working).
Were they created from custom images?
What do they report as pid 1?
> The total number of processes I run is, according to "ps", nearly
> always less than 1000. (Usually "ps -adelfww").
> I almost wonder if that was a transitory problem in Ubuntu 18.04 which
> gets fixed in the containers as the appropriate dist-upgrade gets done.
> === Using USB disk on container-heavy host used to exceed some queue limit ===
> One of these changes, probably either net.core.netdev_max_backlog or
> fs.inotify.max_queued_events, seems to have had the pleasant side effect
> of allowing me to write backups to a USB drive without getting flakiness
> in my user interface, also removing diagnostics which used to occur in
> that situation about some queue limit being raised because of observed
> lost events.
More likely the fs.inotify.max_queued_events
> === My previous pty tweaking now raises a distinct question ===
> Another distinct problem caused me to raise
> /proc/sys/kernel/pty/max
> Given the apparent value /proc/sys/kernel/pty/reserve:1024
> does one need to set kernel/pty/max to (N*1024 plus the total number of
> ptys you expect to allocate) where N is the number of containers
> you expect to run concurrently?
> /proc/sys/kernel/pty/nr
> never seems particularly high now.
> (/proc/sys/kernel/pty/max being another one of the apparent few
> system parameters for which you can monitor the current usage).
Now, this is interesting.
I was routinely killing 3 default login sessions started inside container by
default. For no apparent reason. Seems like I wasn't far off doing that.
> === Trivial observation re: sysctl which helped me when I noted it ===
> "sysctl kernel.pty.max" <=> "cat /proc/sys/kernel/pty/max" sort of.
> I.e. "sysctl A.B.C.D" <=> "cat /proc/sys/A/B/C/D"
Yep. sysctl is a sort of wrapper, you can achieve similar results to sysctl/-w
with simple cat/echo to the respective "files" in /proc
--
With best regards,
Andrey Repin
Saturday, September 14, 2019 8:55:58
Sorry for my terrible english...
More information about the lxc-users
mailing list