[lxc-devel] [lxd/master] [RFC]: production-setup: add net.core.bpf_jit_limit and kernel.keys.m…

brauner on Github lxc-bot at linuxcontainers.org
Mon Mar 9 11:26:03 UTC 2020


A non-text attachment was scrubbed...
Name: not available
Type: text/x-mailbox
Size: 1184 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20200309/b0a54e42/attachment.bin>
-------------- next part --------------
From dd98b789ac950732b74aa1d645eba6a45819fa2a Mon Sep 17 00:00:00 2001
From: Christian Brauner <christian.brauner at ubuntu.com>
Date: Mon, 9 Mar 2020 12:20:54 +0100
Subject: [PATCH] [RFC]: production-setup: add net.core.bpf_jit_limit and
 kernel.keys.maxbytes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

/* kernel.keys.maxbytes */
When all containers share the same id mapping they will share a keyring. But
since each container will get their own session key it will be appended to the
keyring which will thus grow quite large. Thus the limit needs to be bumped.

/* net.core.bpf_jit_limit */
When running a kernel on which has /proc/sys/net/core/bpf_jit_enable set to a
value other than 0 then seccomp will make use of the eBPF JIT compiler so each
container's seccomp filter will be charged against the eBPF JIT limit. Thus the
limit needs to be bumped significantly. Note that a lot of kernels have
CONFIG_BPF_JIT_ALWAYS_ON=y set as a hardening feature which means the
bpf_jit_enable value can't be changed to anything else and will be fixed at 1.

Cc: Tobias Schüring <tobias at raidboxes.de>
Signed-off-by: Christian Brauner <christian.brauner at ubuntu.com>
---
 doc/production-setup.md | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/doc/production-setup.md b/doc/production-setup.md
index d785334681..43c620c43f 100644
--- a/doc/production-setup.md
+++ b/doc/production-setup.md
@@ -34,16 +34,18 @@ root    | hard  | nofile  | 1048576   | unset     | maximum number of open files
 
 ### /etc/sysctl.conf
 
-Parameter                           | Value     | Default | Description
-:-----                              | :---      | :---    | :---
-fs.inotify.max\_queued\_events      | 1048576   | 16384   | This specifies an upper limit on the number of events that can be queued to the corresponding inotify instance. [1]
-fs.inotify.max\_user\_instances     | 1048576   | 128     | This specifies an upper limit on the number of inotify instances that can be created per real user ID. [1]
-fs.inotify.max\_user\_watches       | 1048576   | 8192    | This specifies an upper limit on the number of watches that can be created per real user ID. [1]
-vm.max\_map\_count                  | 262144    | 65530   | This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.
-kernel.dmesg\_restrict              | 1         | 0       | This denies container access to the messages in the kernel ring buffer. Please note that this also will deny access to non-root users on the host system.
-net.ipv4.neigh.default.gc\_thresh3  | 8192      | 1024    | This is the maximum number of entries in ARP table (IPv4). You should increase this if you create over 1024 containers. Otherwise, you will get the error `neighbour: ndisc_cache: neighbor table overflow!` when the ARP table gets full and those containers will not be able to get a network configuration. [2]
-net.ipv6.neigh.default.gc\_thresh3  | 8192      | 1024    | This is the maximum number of entries in ARP table (IPv6). You should increase this if you plan to create over 1024 containers. Otherwise, you will get the error `neighbour: ndisc_cache: neighbor table overflow!` when the ARP table gets full and those containers will not be able to get a network configuration. [2]
-kernel.keys.maxkeys                 | 2000      | 200     | This is the maximum number of keys a non-root user can use, should be higher than the number of containers
+Parameter                           | Value     | Default   | Description
+:-----                              | :---      | :---      | :---
+fs.inotify.max\_queued\_events      | 1048576   | 16384     | This specifies an upper limit on the number of events that can be queued to the corresponding inotify instance. [1]
+fs.inotify.max\_user\_instances     | 1048576   | 128       | This specifies an upper limit on the number of inotify instances that can be created per real user ID. [1]
+fs.inotify.max\_user\_watches       | 1048576   | 8192      | This specifies an upper limit on the number of watches that can be created per real user ID. [1]
+vm.max\_map\_count                  | 262144    | 65530     | This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries.
+kernel.dmesg\_restrict              | 1         | 0         | This denies container access to the messages in the kernel ring buffer. Please note that this also will deny access to non-root users on the host system.
+net.ipv4.neigh.default.gc\_thresh3  | 8192      | 1024      | This is the maximum number of entries in ARP table (IPv4). You should increase this if you create over 1024 containers. Otherwise, you will get the error `neighbour: ndisc_cache: neighbor table overflow!` when the ARP table gets full and those containers will not be able to get a network configuration. [2]
+net.ipv6.neigh.default.gc\_thresh3  | 8192      | 1024      | This is the maximum number of entries in ARP table (IPv6). You should increase this if you plan to create over 1024 containers. Otherwise, you will get the error `neighbour: ndisc_cache: neighbor table overflow!` when the ARP table gets full and those containers will not be able to get a network configuration. [2]
+net.core.bpf_jit_limit              | ????????? | 264241152 | This is a limit on the size of eBPF JIT allocations which is usually set to PAGE_SIZE * 40000. When `/proc/sys/net/core/bpf_jit_enable` is set to a value other than `0` then `seccomp` will make use of the eBPF JIT compiler so each container's	`seccomp` filter will be charged against the eBPF JIT limit
+kernel.keys.maxkeys                 | 2000      | 200       | This is the maximum number of keys a non-root user can use, should be higher than the number of containers
+kernel.keys.maxbytes                | ????????? | 20000     | This is the maximum size of the keyring non-root users can use, should be higher than the number of containers
 
 Then, reboot the server.
 


More information about the lxc-devel mailing list