<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<font face="Droid Serif">I'm not sure this is actually an issue with
LXD but I've been scratching my head on this for a while and
unable to figure out what is going on so reaching out to many
different sources. I'm intermittently losing connection to all of
my container's second interfaces. If I ping out *from* the
container *to* an external address then the network connection is
restored temporarily. Anywhere between 5 to 60 minutes later the
problem reappears. On the surface it looks like a routing or
reverse path filtering issue, but (I believe) I've setup those
parameters properly.<br>
<br>
For example I'm trying to ping from my local box (172.16.44.18) to
the container's second interface called "veth-int-core"
(10.2.80.129). Note that all general traffic is supposed to go out
the first interface called "veth-mgmt" (10.2.28.65) and that the
default gateway is set on this interface. I've set rp_filter on
veth-int-core to 2 so the system should not drop the packet
because of reverse path filtering.<br>
<br>
<font face="Droid Sans Mono">root@container1:~# cat
/proc/sys/net/ipv4/conf/veth-int-core/rp_filter <br>
2</font><br>
<br>
<br>
From my local box I try ping the veth-int-core interface on the
container and receive no response:<br>
<font face="Droid Sans Mono">root@client:~$ date -u; ping -c
10 10.2.80.129; date -u<br>
Tue 30 Jun 2020 22:09:34 PM UTC<br>
PING 10.2.80.129 (10.2.80.129) 56(84) bytes of data.<br>
<br>
--- 10.2.80.129 ping statistics ---<br>
10 packets transmitted, 0 received, 100% packet loss, time
9200ms<br>
<br>
Tue 30 Jun 2020 22:09:53 PM UTC</font><br>
<br>
If I sniff the wire on the container at the same time we can see
the packet arrive with the ICMP request. We can also see an ICMP
type 3 code 1 (destination unreachable) response which includes
the ICMP reply in the packet.<br>
<br>
<font face="Droid Sans Mono">root@container1:~# tcpdump -nevi
any icmp<br>
tcpdump: listening on any, link-type LINUX_SLL (Linux
cooked), capture size 262144 bytes<br>
16:09:34.626373 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 7638, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 1, length 64<br>
16:09:35.633862 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 7689, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 2, length 64<br>
16:09:36.657897 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 7882, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 3, length 64<br>
16:09:37.682063 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 7901, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 4, length 64<br>
16:09:37.695263 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 59700, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 9378, offset 0, flags [none], proto
ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 1, length 64<br>
16:09:37.695271 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 59701, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 9430, offset 0, flags [none], proto
ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 2, length 64<br>
16:09:37.695276 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 59702, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 9612, offset 0, flags [none], proto
ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 3, length 64<br>
16:09:38.705661 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 8081, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 5, length 64<br>
16:09:39.729581 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 8101, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 6, length 64<br>
16:09:40.753507 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 8299, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 7, length 64<br>
16:09:41.759252 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 60134, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 9813, offset 0, flags [none], proto
ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 5, length 64<br>
16:09:41.759259 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 60135, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 10019, offset 0, flags [none],
proto ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 6, length 64<br>
16:09:41.759264 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 60136, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 10271, offset 0, flags [none],
proto ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 7, length 64<br>
16:09:41.777449 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 8474, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 8, length 64<br>
16:09:42.801428 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 8491, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 9, length 64<br>
16:09:43.825371 In e4:aa:5d:99:88:4a ethertype IPv4
(0x0800), length 100: (tos 0x0, ttl 62, id 8683, offset 0, flags
[DF], proto ICMP (1), length 84)<br>
172.16.44.18 > 10.2.80.129: ICMP echo request, id
18986, seq 10, length 64<br>
16:09:44.831260 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 60642, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 10484, offset 0, flags [none],
proto ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 8, length 64<br>
16:09:44.831267 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 60643, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 10689, offset 0, flags [none],
proto ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 9, length 64<br>
16:09:44.831272 In 00:00:00:00:00:00 ethertype IPv4
(0x0800), length 128: (tos 0xc0, ttl 64, id 60644, offset 0,
flags [none], proto ICMP (1), length 112)<br>
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18
unreachable, length 92<br>
(tos 0x0, ttl 64, id 10878, offset 0, flags [none],
proto ICMP (1), length 84)<br>
10.2.80.129 > 172.16.44.18: ICMP echo reply, id
18986, seq 10, length 64</font><br>
<br>
To me this indicates a routing issue. It looks like the container
doesn't know how to route the ICMP reply back to the client.
However, if I query the routing table it knows it needs to use the
gateway interface:<br>
<br>
<font face="Droid Sans Mono"> root@container1:~# ip route get
172.16.44.18<br>
172.16.44.18 via 10.2.28.1 dev veth-mgmt src 10.2.28.65 uid
0 <br>
cache</font><br>
<br>
And the really odd part is that if I try to actually ping *from*
the container *to* my local box it works AND afterwards my
original ping *from* my local box *to* the container starts to
work.<br>
<br>
To demonstrate, if I start the ping *from* my box *to* the
container and in the middle of the ping I run a second ping *from*
the container *to* my box the packets sent after the return ping
is initialized will work:<br>
<br>
<font face="Droid Sans Mono">root@client:~$ date -u; ping -c
30 10.2.80.129; date -u<br>
Tue 30 Jun 2020 22:30:29 PM UTC<br>
PING 10.2.80.129 (10.2.80.129) 56(84) bytes of data.<br>
64 bytes from 10.2.80.129: icmp_seq=4 ttl=62 time=1043 ms<br>
64 bytes from 10.2.80.129: icmp_seq=5 ttl=62 time=19.0 ms<br>
64 bytes from 10.2.80.129: icmp_seq=6 ttl=62 time=4.33 ms<br>
64 bytes from 10.2.80.129: icmp_seq=7 ttl=62 time=4.19 ms<br>
64 bytes from 10.2.80.129: icmp_seq=8 ttl=62 time=4.15 ms<br>
64 bytes from 10.2.80.129: icmp_seq=9 ttl=62 time=4.19 ms<br>
64 bytes from 10.2.80.129: icmp_seq=10 ttl=62 time=4.51 ms<br>
64 bytes from 10.2.80.129: icmp_seq=11 ttl=62 time=4.26 ms<br>
64 bytes from 10.2.80.129: icmp_seq=12 ttl=62 time=4.39 ms<br>
64 bytes from 10.2.80.129: icmp_seq=13 ttl=62 time=4.15 ms<br>
64 bytes from 10.2.80.129: icmp_seq=14 ttl=62 time=4.37 ms<br>
64 bytes from 10.2.80.129: icmp_seq=15 ttl=62 time=4.17 ms<br>
64 bytes from 10.2.80.129: icmp_seq=16 ttl=62 time=4.39 ms<br>
64 bytes from 10.2.80.129: icmp_seq=17 ttl=62 time=4.38 ms<br>
64 bytes from 10.2.80.129: icmp_seq=18 ttl=62 time=4.34 ms<br>
64 bytes from 10.2.80.129: icmp_seq=19 ttl=62 time=4.32 ms<br>
64 bytes from 10.2.80.129: icmp_seq=20 ttl=62 time=4.80 ms<br>
64 bytes from 10.2.80.129: icmp_seq=21 ttl=62 time=4.28 ms<br>
64 bytes from 10.2.80.129: icmp_seq=22 ttl=62 time=4.32 ms<br>
64 bytes from 10.2.80.129: icmp_seq=23 ttl=62 time=4.28 ms<br>
64 bytes from 10.2.80.129: icmp_seq=24 ttl=62 time=4.22 ms<br>
64 bytes from 10.2.80.129: icmp_seq=25 ttl=62 time=4.25 ms<br>
64 bytes from 10.2.80.129: icmp_seq=26 ttl=62 time=4.21 ms<br>
64 bytes from 10.2.80.129: icmp_seq=27 ttl=62 time=4.34 ms<br>
64 bytes from 10.2.80.129: icmp_seq=28 ttl=62 time=4.31 ms<br>
64 bytes from 10.2.80.129: icmp_seq=29 ttl=62 time=4.15 ms<br>
64 bytes from 10.2.80.129: icmp_seq=30 ttl=62 time=4.60 ms<br>
<br>
--- 10.2.80.129 ping statistics ---<br>
30 packets transmitted, 27 received, 10% packet loss, time
29137ms<br>
rtt min/avg/max/mdev = 4.145/43.328/1042.959/196.063 ms,
pipe 2<br>
Tue 30 Jun 2020 22:31:01 PM UTC<br>
<br>
root@container1:~# date -u; ping -c 4 172.16.44.18; date -u<br>
Tue Jun 30 22:30:33 UTC 2020<br>
PING 172.16.44.18 (172.16.44.18) 56(84) bytes of data.<br>
64 bytes from 172.16.44.18: icmp_seq=1 ttl=63 time=444 ms<br>
64 bytes from 172.16.44.18: icmp_seq=2 ttl=63 time=4.30 ms<br>
64 bytes from 172.16.44.18: icmp_seq=3 ttl=63 time=4.27 ms<br>
64 bytes from 172.16.44.18: icmp_seq=4 ttl=63 time=4.23 ms<br>
<br>
--- 172.16.44.18 ping statistics ---<br>
4 packets transmitted, 4 received, 0% packet loss, time
3003ms<br>
rtt min/avg/max/mdev = 4.238/114.371/444.666/190.695 ms<br>
Tue Jun 30 22:30:36 UTC 2020<br>
</font><br>
From this point on I can successfully communicate with the
veth-int-core interface. If no traffic is pushed to that interface
for anywhere between 5 to 60 minutes then the problem comes back.
I've tried:<br>
<br>
- Seeing if any information shows up in the kernel logs on the
host (nothing that I could see).<br>
- Restarting the containers.<br>
- Restarting the LXD host.<br>
- Moving the containers to another host (the problem persisted).<br>
- Changing the rp_filter setting on one or both interfaces.<br>
- Looking at the lxd logs to see if anything related shows up.<br>
<br>
Any pointers on where I could look to get more info would be
appreciated.<br>
</font>
<pre class="moz-signature" cols="0">--
Thanks,
Joshua Schaeffer</pre>
</body>
</html>