[lxc-devel] [lxd/master] Add first stab at FAQ
xrobau on Github
lxc-bot at linuxcontainers.org
Tue Mar 5 00:12:47 UTC 2019
A non-text attachment was scrubbed...
Name: not available
Type: text/x-mailbox
Size: 412 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20190304/836f48b8/attachment.bin>
-------------- next part --------------
From 82882ec8b315ebc31ca30d87d0301fcd7beb4c6d Mon Sep 17 00:00:00 2001
From: Rob Thomas <xrobau at gmail.com>
Date: Mon, 4 Mar 2019 21:08:45 -0300
Subject: [PATCH] Add first stab at FAQ
---
doc/debugging.md | 4 ++
doc/faq.md | 146 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+)
create mode 100644 doc/faq.md
diff --git a/doc/debugging.md b/doc/debugging.md
index a5bf78f813..2086f25307 100644
--- a/doc/debugging.md
+++ b/doc/debugging.md
@@ -1,5 +1,9 @@
# Debugging
+For information on debugging container issues, see [Frequently Asked Questions](faq.md)
+
+## Debugging `lxc` and `lxd`
+
Here are different ways to help troubleshooting `lxc` and `lxd` code.
#### lxc --debug
diff --git a/doc/faq.md b/doc/faq.md
new file mode 100644
index 0000000000..7976f6d15e
--- /dev/null
+++ b/doc/faq.md
@@ -0,0 +1,146 @@
+# Frequently Asked Questions
+
+## Container Startup Issues
+
+If your container is not starting, or not behaving as you would expect,
+the first thing to do is to look at the console logs generated by the
+container, using the `lxc console --show-log CONTAINERNAME` command.
+
+In this example, we will investigate a RHEL 7 system in which `systemd`
+can not start.
+
+ # lxc console --show-log systemd
+ Console log:
+
+ Failed to insert module 'autofs4'
+ Failed to insert module 'unix'
+ Failed to mount sysfs at /sys: Operation not permitted
+ Failed to mount proc at /proc: Operation not permitted
+ [!!!!!!] Failed to mount API filesystems, freezing.
+
+The errors here say that /sys and /proc can not be mounted - which is
+correct in an unprivileged container. However, LXD does mount these
+filesystems automatically _if it can_.
+
+The [container requirements](container-environment.md) specify that
+every container must come with an empty `/dev`, `/proc`, and `/sys`
+folder, as well as `/sbin/init` existing. If those folders don't
+exist, LXD will be unable to mount to them, and systemd will then
+try to. As this is an unprivileged container, systemd does not have
+the ability to do this, and it then freezes.
+
+So you can see the environment before anything is changed, you can
+explicitly change the init in a container using the `raw.lxc` config
+param. This is equivalent to setting `init=/bin/bash` on the linux
+kernel commandline.
+
+ lxc config set systemd raw.lxc 'lxc.init.cmd = /bin/bash'
+
+Here is what it looks like:
+
+ root at lxc-01:~# lxc config set systemd raw.lxc 'lxc.init.cmd = /bin/bash'
+ root at lxc-01:~# lxc start systemd
+ root at lxc-01:~# lxc console --show-log systemd
+
+ Console log:
+
+ [root at systemd /]#
+ root at lxc-01:~#
+
+Now that the container has started, you can look in it and see that things are
+not running as well as expected.
+
+ root at lxc-01:~# lxc exec systemd bash
+ [root at systemd ~]# ls
+ [root at systemd ~]# mount
+ mount: failed to read mtab: No such file or directory
+ [root at systemd ~]# cd /
+ [root at systemd /]# ls /proc/
+ sys
+ [root at systemd /]# exit
+
+Because LXD tries to auto-heal, it *did* create some of the folders when it was
+starting up. Shutting down and restarting the container will fix the problem, but
+the original cause is still there - the **template does not contain the required
+files**.
+
+## Networking Issues
+
+In a larger [Production Environment](production-setup.md), it is common to have
+multiple vlans and have LXD clients attached directly to those vlans. Be aware that
+if you are using netplan and system-networkd, you will encounter some bugs that
+could cause catastropic issues
+
+### Do not use system-networkd with netplan and bridges based on vlans
+
+At time of writing (2019-03-05), netplan can not assign a random MAC address to
+a bridge attached to a vlan. It always picks the same MAC address, which causes
+layer2 issues when you have more than one machine on the same network segment.
+It also has difficultly creating multiple bridges. Make sure you use
+`network-manager` instead. An example config is below, with a management
+address of 10.61.0.25, and VLAN102 being used for client traffic.
+
+ network:
+ version: 2
+ renderer: NetworkManager
+ ethernets:
+ eth0:
+ dhcp4: no
+ accept-ra: no
+ # This is the 'Management Address'
+ addresses: [ 10.61.0.25/24 ]
+ gateway4: 10.61.0.1
+ nameservers:
+ addresses: [ 1.1.1.1, 8.8.8.8 ]
+ eth1:
+ dhcp4: no
+ accept-ra: no
+ # A bogus IP address is required to ensure the link state is up
+ addresses: [ 10.254.254.25/32 ]
+
+ vlans:
+ vlan102:
+ accept-ra: no
+ dhcp4: no
+ id: 102
+ link: eth1
+
+ bridges:
+ br102:
+ accept-ra: no
+ dhcp4: no
+ interfaces: [ "vlan102" ]
+ # A bogus IP address is required to ensure the link state is up
+ addresses: [ 10.254.102.25/32 ]
+ parameters:
+ stp: false
+
+#### Things to note
+
+* eth0 is the Management interface, with the default gateway.
+* vlan102 uses eth1.
+* br102 uses vlan102, and _has a bogus /32 IP address assigned to it_
+
+The other important thing is to set `stp: false`, otherwise the bridge will sit
+in `learning` state for up to 10 seconds, which is longer than most DHCP requests
+last. As there is no possibility of cross-connecting and causing loops, this is
+safe to do.
+
+### Beware of 'port security'
+
+Many switches do *not* allow MAC address changes, and will either drop traffic
+with an incorrect MAC, or, disable the port totally. If you can ping a LXD container
+from the host, but are not able to ping it from a _different_ host, this could be
+the cause. The way to diagnose this is to run a tcpdump on the uplink (in this case,
+eth1), and you will see either 'ARP Who has xx.xx.xx.xx tell yy.yy.yy.yy', with you
+sending responses but them not getting acknowledged, or, ICMP packets going in and
+out successfully, but never being received by the other host.
+
+### Do not run privileged containers unless neccesary.
+
+A privileged container can do things that effect the entire host - for example, it
+can use things in /sys to reset the network card, which will reset it for **the entire
+host**, causing network blips. Almost everything can be run in an unprivileged container,
+or - in cases of things that require unusual privileges, like wanting to mount NFS
+filesystems inside the container, you may need to use bind mounts.
+
More information about the lxc-devel
mailing list