[lxc-users] Intermittent network issue with containers
Joshua Schaeffer
jschaeffer at harmonywave.com
Wed Jul 1 06:05:00 UTC 2020
I'm not sure this is actually an issue with LXD but I've been scratching my head on this for a while and unable to figure out what is going on so reaching out to many different sources. I'm intermittently losing connection to all of my container's second interfaces. If I ping out *from* the container *to* an external address then the network connection is restored temporarily. Anywhere between 5 to 60 minutes later the problem reappears. On the surface it looks like a routing or reverse path filtering issue, but (I believe) I've setup those parameters properly.
For example I'm trying to ping from my local box (172.16.44.18) to the container's second interface called "veth-int-core" (10.2.80.129). Note that all general traffic is supposed to go out the first interface called "veth-mgmt" (10.2.28.65) and that the default gateway is set on this interface. I've set rp_filter on veth-int-core to 2 so the system should not drop the packet because of reverse path filtering.
root at container1:~# cat /proc/sys/net/ipv4/conf/veth-int-core/rp_filter
2
>From my local box I try ping the veth-int-core interface on the container and receive no response:
root at client:~$ date -u; ping -c 10 10.2.80.129; date -u
Tue 30 Jun 2020 22:09:34 PM UTC
PING 10.2.80.129 (10.2.80.129) 56(84) bytes of data.
--- 10.2.80.129 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9200ms
Tue 30 Jun 2020 22:09:53 PM UTC
If I sniff the wire on the container at the same time we can see the packet arrive with the ICMP request. We can also see an ICMP type 3 code 1 (destination unreachable) response which includes the ICMP reply in the packet.
root at container1:~# tcpdump -nevi any icmp
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
16:09:34.626373 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 7638, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 1, length 64
16:09:35.633862 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 7689, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 2, length 64
16:09:36.657897 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 7882, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 3, length 64
16:09:37.682063 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 7901, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 4, length 64
16:09:37.695263 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 59700, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 9378, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 1, length 64
16:09:37.695271 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 59701, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 9430, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 2, length 64
16:09:37.695276 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 59702, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 9612, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 3, length 64
16:09:38.705661 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 8081, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 5, length 64
16:09:39.729581 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 8101, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 6, length 64
16:09:40.753507 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 8299, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 7, length 64
16:09:41.759252 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 60134, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 9813, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 5, length 64
16:09:41.759259 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 60135, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 10019, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 6, length 64
16:09:41.759264 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 60136, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 10271, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 7, length 64
16:09:41.777449 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 8474, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 8, length 64
16:09:42.801428 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 8491, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 9, length 64
16:09:43.825371 In e4:aa:5d:99:88:4a ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 62, id 8683, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.44.18 > 10.2.80.129: ICMP echo request, id 18986, seq 10, length 64
16:09:44.831260 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 60642, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 10484, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 8, length 64
16:09:44.831267 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 60643, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 10689, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 9, length 64
16:09:44.831272 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 128: (tos 0xc0, ttl 64, id 60644, offset 0, flags [none], proto ICMP (1), length 112)
10.2.80.129 > 10.2.80.129: ICMP host 172.16.44.18 unreachable, length 92
(tos 0x0, ttl 64, id 10878, offset 0, flags [none], proto ICMP (1), length 84)
10.2.80.129 > 172.16.44.18: ICMP echo reply, id 18986, seq 10, length 64
To me this indicates a routing issue. It looks like the container doesn't know how to route the ICMP reply back to the client. However, if I query the routing table it knows it needs to use the gateway interface:
root at container1:~# ip route get 172.16.44.18
172.16.44.18 via 10.2.28.1 dev veth-mgmt src 10.2.28.65 uid 0
cache
And the really odd part is that if I try to actually ping *from* the container *to* my local box it works AND afterwards my original ping *from* my local box *to* the container starts to work.
To demonstrate, if I start the ping *from* my box *to* the container and in the middle of the ping I run a second ping *from* the container *to* my box the packets sent after the return ping is initialized will work:
root at client:~$ date -u; ping -c 30 10.2.80.129; date -u
Tue 30 Jun 2020 22:30:29 PM UTC
PING 10.2.80.129 (10.2.80.129) 56(84) bytes of data.
64 bytes from 10.2.80.129: icmp_seq=4 ttl=62 time=1043 ms
64 bytes from 10.2.80.129: icmp_seq=5 ttl=62 time=19.0 ms
64 bytes from 10.2.80.129: icmp_seq=6 ttl=62 time=4.33 ms
64 bytes from 10.2.80.129: icmp_seq=7 ttl=62 time=4.19 ms
64 bytes from 10.2.80.129: icmp_seq=8 ttl=62 time=4.15 ms
64 bytes from 10.2.80.129: icmp_seq=9 ttl=62 time=4.19 ms
64 bytes from 10.2.80.129: icmp_seq=10 ttl=62 time=4.51 ms
64 bytes from 10.2.80.129: icmp_seq=11 ttl=62 time=4.26 ms
64 bytes from 10.2.80.129: icmp_seq=12 ttl=62 time=4.39 ms
64 bytes from 10.2.80.129: icmp_seq=13 ttl=62 time=4.15 ms
64 bytes from 10.2.80.129: icmp_seq=14 ttl=62 time=4.37 ms
64 bytes from 10.2.80.129: icmp_seq=15 ttl=62 time=4.17 ms
64 bytes from 10.2.80.129: icmp_seq=16 ttl=62 time=4.39 ms
64 bytes from 10.2.80.129: icmp_seq=17 ttl=62 time=4.38 ms
64 bytes from 10.2.80.129: icmp_seq=18 ttl=62 time=4.34 ms
64 bytes from 10.2.80.129: icmp_seq=19 ttl=62 time=4.32 ms
64 bytes from 10.2.80.129: icmp_seq=20 ttl=62 time=4.80 ms
64 bytes from 10.2.80.129: icmp_seq=21 ttl=62 time=4.28 ms
64 bytes from 10.2.80.129: icmp_seq=22 ttl=62 time=4.32 ms
64 bytes from 10.2.80.129: icmp_seq=23 ttl=62 time=4.28 ms
64 bytes from 10.2.80.129: icmp_seq=24 ttl=62 time=4.22 ms
64 bytes from 10.2.80.129: icmp_seq=25 ttl=62 time=4.25 ms
64 bytes from 10.2.80.129: icmp_seq=26 ttl=62 time=4.21 ms
64 bytes from 10.2.80.129: icmp_seq=27 ttl=62 time=4.34 ms
64 bytes from 10.2.80.129: icmp_seq=28 ttl=62 time=4.31 ms
64 bytes from 10.2.80.129: icmp_seq=29 ttl=62 time=4.15 ms
64 bytes from 10.2.80.129: icmp_seq=30 ttl=62 time=4.60 ms
--- 10.2.80.129 ping statistics ---
30 packets transmitted, 27 received, 10% packet loss, time 29137ms
rtt min/avg/max/mdev = 4.145/43.328/1042.959/196.063 ms, pipe 2
Tue 30 Jun 2020 22:31:01 PM UTC
root at container1:~# date -u; ping -c 4 172.16.44.18; date -u
Tue Jun 30 22:30:33 UTC 2020
PING 172.16.44.18 (172.16.44.18) 56(84) bytes of data.
64 bytes from 172.16.44.18: icmp_seq=1 ttl=63 time=444 ms
64 bytes from 172.16.44.18: icmp_seq=2 ttl=63 time=4.30 ms
64 bytes from 172.16.44.18: icmp_seq=3 ttl=63 time=4.27 ms
64 bytes from 172.16.44.18: icmp_seq=4 ttl=63 time=4.23 ms
--- 172.16.44.18 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 4.238/114.371/444.666/190.695 ms
Tue Jun 30 22:30:36 UTC 2020
>From this point on I can successfully communicate with the veth-int-core interface. If no traffic is pushed to that interface for anywhere between 5 to 60 minutes then the problem comes back. I've tried:
- Seeing if any information shows up in the kernel logs on the host (nothing that I could see).
- Restarting the containers.
- Restarting the LXD host.
- Moving the containers to another host (the problem persisted).
- Changing the rp_filter setting on one or both interfaces.
- Looking at the lxd logs to see if anything related shows up.
Any pointers on where I could look to get more info would be appreciated.
--
Thanks,
Joshua Schaeffer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20200701/504450e8/attachment.htm>
More information about the lxc-users
mailing list