[lxc-users] lxc container network occasional problem with bridge network on bonding device

Wed Sep 19 15:20:28 UTC 2018

Hi, Andrey. Thanks for reply. 

It took some time to reproduce the problem. But now I found what to do.

> How do you connect containers to the bridge?

Here’s lxc info shows.

# lxc profile show default
config: {}
description: Default LXD profile
devices:
 eth0:
   name: eth0
   nictype: bridged
   parent: br0
   type: nic
 root:
   path: /
   pool: default
   type: disk
name: default

> Can containers talk to each other when this happens?

Yes.

> Can host talk to the world at that same time?

Yes.

I do not attach log of results of ping commands since they are just trivial.

> And netplan did not yell at you?

I identified the time when this problem happened and inspected /var/log/syslog at that time.
There was nothing.

I found a way to reproduce the problem quickly. The procedure is

  (1) connect both of the LAN cables
  (2) stop all containers  (I am not sure whether “all” is necessary.)
  (3) start some of the containers
  (4) the problem occurs on the started container just after or serveral minutes after
      the restart

Regards,

> 2018/09/18 2:47、Andrey Repin <anrdaemon at yandex.ru>のメール:
> 
> Greetings, toshinao!
> 
>> Hi.
> 
>> I experienced occasional network problem of containers running on ubuntu server 18.04.1. Containers
>> can communicate with host IP always and they can communicate sometimes to the other hosts but they
>> are disconnected occasionally. When the problem occurs, the ping from the container to external hosts
>> does not reach at all, but very rarely they recover after, for example, several hours later.
>> Disconnection happens much more easily. 
> 
>> The host network is organized by using netplan in the following topology.
> 
>>           +-eno1-< <--lan_cable--> >-+
>> br0--bond0-+                          +-- Cisco 3650
>>           +-en02-< <--lan_cable--> >-+
> 
>> The bonding mode is balance-a1b.
> 
> ALB
> Adaptive Load Balancing
> 
>> I also found that if one of the LAN cables is physically disconnected,
>> this problem has never happened.
> 
> How do you connect containers to the bridge?
> 
>> By using iptraf-ng, I watched the bridge device, the following br0, as well as the slave devices.
>> Even if containers send a ping to the external hosts, no ping packet is detected, when they cannot
>> communicate. Ping packets are detected by iptraf-ng on these devices when the communication is working.
> 
>> I guess this can be a low-level problem of virtual networking. Are there any suggestions to solve
>> the problem ?
> 
> Can containers talk to each other when this happens?
> Can host talk to the world at that same time?
> 
>> Here's the detail of the setting.
> 
>> host's netplan setting
> 
>> network:
>> version: 2
>> renderer: networkd
>> ethernets:
>>  eno1:
>>    dhcp4: no
>>  eno2:
>>    dhcp4: no
>> bonds:
>>  bond0:
>>    interfaces: [eno1, eno2]
>>    parameters:
>>      mode: balanec-a1b
> 
> And netplan did not yell at you?
> 
>> bridges:
>>  br0:
>>    interfaces:
>>      - bond0
>>    addresses: [10.1.2.3/24]
>>    gateway4: 10.1.2.254
>>    dhcp4: no
> 
>> host network interface status
> 
>> host# ip a s
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
>>   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>   inet 127.0.0.1/8 scope host lo
>>      valid_lft forever preferred_lft forever
>>   inet6 ::1/128 scope host
>>      valid_lft forever preferred_lft forever
>> 2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master
>> bond0 state UP group default qlen 1000
>>   link/ether 0b:25:b5:f2:e1:34 brd ff:ff:ff:ff:ff:ff
>> 3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master
>> bond0 state UP group default qlen 1000
>>   link/ether 0b:25:b5:f2:e1:35 brd ff:ff:ff:ff:ff:ff
>> 4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>   link/ether 0a:1a:6c:85:ff:ed brd ff:ff:ff:ff:ff:ff
>>   inet 10.1.2.3/24 brd 10.1.2.255 scope global br0
>>      valid_lft forever preferred_lft forever
>>   inet6 fe80::81a:6cff:fe85:ffed/64 scope link
>>      valid_lft forever preferred_lft forever
>> 5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
>> master br0 state UP group default qlen 1000
>>   link/ether 0a:54:4b:f2:d7:10 brd ff:ff:ff:ff:ff:ff
>> 7: vethK4HOFU at if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>> master br0 state UP group default qlen 1000
>>   link/ether fe:ca:07:3e:2b:2d brd ff:ff:ff:ff:ff:ff link-netnsid 0
>>   inet6 fe80::fcca:7ff:fe3e:2b2d/64 scope link
>>      valid_lft forever preferred_lft forever
>> 9: veth77HJ0V at if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
>> master br0 state UP group default qlen 1000
>>   link/ether fe:85:f0:ef:78:b2 brd ff:ff:ff:ff:ff:ff link-netnsid 1
>>   inet6 fe80::fc85:f0ff:feef:78b2/64 scope link
>>      valid_lft forever preferred_lft forever
> 
>> container's network interface status
> 
>> root at bionic0:~# ip a s
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
>>   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>   inet 127.0.0.1/8 scope host lo
>>      valid_lft forever preferred_lft forever
>>   inet6 ::1/128 scope host
>>      valid_lft forever preferred_lft forever
>> 6: eth0 at if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
>>   link/ether 00:16:3e:cb:ef:ce brd ff:ff:ff:ff:ff:ff link-netnsid 0
>>   inet 10.1.2.20/24 brd 10.1.2.255 scope global eth0
>>      valid_lft forever preferred_lft forever
>>   inet6 fe80::216:3eff:fecb:efce/64 scope link
>>      valid_lft forever preferred_lft forever
> 
> 
> -- 
> With best regards,
> Andrey Repin
> Monday, September 17, 2018 20:41:41
> 
> Sorry for my terrible english...
> 
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users