[Lxc-users] Slow response times (at least, from the LAN) to LXC containers

Mon Mar 15 21:36:40 UTC 2010

Michael B. Trausch wrote:
> On 03/15/2010 01:05 PM, Brian K. White wrote:
>   
>> I have 2 cable accounts at my office just like you describe,
>> 5 static ips provided by a cable company supplied router with a built-in
>> 4 port switch. But I can't exactly replicate your setup because:
>>
>> 1) The router and cable modem are seperate boxes. The router part is a
>> cisco 800 series router with one wan nic and one lan nic and 4 bridged
>> lan ports. This means my hardware can't be the same as yours because you
>> described a single box with integrated cable modem. So, being not
>> identical, my hardware may not behave exactly as yours does and it may
>> not be an LXC problem, merely a problem LXC tickles.
>>     
>
> Yes, it is an SMC box.  I don't have the model number off-hand, but it's 
> essentially the same device as the SMC-8014, which (other than being 
> highly sensitive to slight fluctuations in power input) works rather 
> well.  That said, the device should not have anything to do with 
> it---two nodes on an IP network that are on the same physical segment 
> talk directly to each other, not through the gateway.
>
>   
>> 2) More relevant, those cisco's aren't doing any nat for me. I treat the
>> lan ports on those routers as part of the internet and are only
>> connected to nics with public ip's in the particular range of each
>> particular router. No connections to nics or switches that connect to
>> any other nics having any ips outside that range.
>>     
>
> I don't think that this should be an issue either, since again, we're 
> talking about all systems on the same physical segment.  After 5 PM, I 
> can disconnect the router and test to be sure, but I am certain that the 
> problem with communicating to the LXC containers will persist after it 
> is removed and that I will still be able to communicate with the 
> hardware nodes that have IP addresses in the 173.15.213.184/29 range 
> just fine.  That is what I would expect given the relevant standards; 
> the only purpose that the device serves is to be a gateway to the 
> Internet for both IP network numbers.
>
>   
>> I have a few lxc boxes set up here, and one of them is on one of these
>> cable lines. But both the lxc host and the containers within it all have
>> public ip's from the same 5-address pool of usable addresses for that
>> router. The host and the containers do also have private lan ip's, but
>> they bare all on a seperate nic on the host, and that nic connects to a
>> seperate switch, which, even thought that nat traffic does happen to
>> ultimately go back out via one of the public ip's on that same cable
>> line, it does so via a seperate physical network and nat router, which
>> happens to be another a linux box with 2 nics, one stricty private one
>> public directly connected to one of the lan ports on the cisco.
>>     
>
> That would be one way to set things up, however, such a setup is out of 
> my reach.  This setup meets my needs (except for this issue that I am 
> having here) and has for two years now.
>
>   
>> Perhaps lxc does mis a beat somewhere with that network, or perhaps it's
>> the router, but I think this kind of mystery problem is exactly why I
>> "just don't do that". I know it's technically "legal" and I'd do it if I
>> had a reason to some time, but where possible I don't mix ip ranges on a
>> physical network or within a vlan at least. Especially I avoid potential
>> routing ambiguity such as having lan and wan traffic on the same
>> physical net where both would end up routing, for different reasons, to
>> the same gateway device or nic. Thats just begging for problems.
>>     
>
> There should not be any routing ambiguity here.  ARP is able to find IP 
> addresses of all the hardware nodes just fine, and connections between 
> real hardware systems with real Ethernet cards work perfectly regardless 
> of the operating system running on that real hardware.
>
> That is, no routing is necessary (on _this_ network) to go from 
> 172.16.0.30 to 173.15.213.185, nor the inverse.  All systems are aware 
> of the fact that 172.16.0.0/24 and 173.15.213.184/29 are on the local link.
>
> As I have previously mentioned, this setup works with all of the following:
>
>   * Real hardware with real Ethernet cards,
>   * KVM virtual machines attached via a bridge interface,
>   * VirtualBox virtual machines attached via a bridge interface,
>   * QEMU virtual machines attached via a bridge interface, and
>   * OpenVZ container instances attached via a bridge interface.
>
> The only thing I can not get to reliably work are these containers 
> running under LXC; I therefore do not have any probable cause to "blame 
> the network", nor do I have any evidence that any system on this network 
> is failing to adhere to some standard or specification correctly.  If I 
> did, I'd be trying to find it and fix it.  And believe me, if I had an 
> excuse to get rid of this SMC appliance that they have on this network, 
> I would take advantage of it---I'd love to give its IP address to a 
> Linux box that I control to do the IPv4 NAT routing (and then I would 
> not have to do my IPv6 routing from within one of my containers nor give 
> up a second address for that).
>
> My first clue that there was something amiss with LXC was, in fact, the 
> IPv6.  Now, OpenVZ is not capable of doing tunnels inside of containers, 
> so I cannot compare to that.  When I was running OpenVZ containers, I 
> used a KVM instance for my IPv6 routing.  Note that in that case, IPv6 
> forwarding did _not_ need to be enabled on the host system.  However, 
> for an LXC container to be able to run an IPv6 tunnel and communicate 
> using IPv6 with the LAN, I had to enable IPv6 forwarding not only in the 
> container, but on the host system as well.  This tells me that there is 
> not a complete separation of the interfaces, and that there is something 
> bleeding around the edges outside of the bridge.  I have not yet had the 
> time to actually take a look at the code to confirm this suspicion of 
> mine, but it is the only rational explanation that I can come up with.
>
> Moving from there to the problem at hand, I can only come to the 
> conclusion that there is some obscure bug somewhere in LXC or the 
> modifications to the networking stack that serve LXC that needs to be 
> hammered out.  I'd absolutely _love_ to be able to rip up my network and 
> bring it to someone who knows the kernel code and LXC code well enough 
> to draw concrete conclusions from it.
>   

The main difference between lxc and the other solutions like OpenVZ and 
kvm, is lxc tries to make things configurable, so you can tune the level 
of isolation of your container.

This difference can explain why a monolithic component like OpenVZ may 
have hardcoded a specific network configuration.

And this is the difference I am trying to understand by analyzing the 
network behavior. There is certainly something tunable in the network 
configuration for lxc or a bug in the kernel. But I have not enough clue 
to dig in a particular way. I am not trying to push back the problems 
you are facing with lxc saying the problem is coming from your 
configuration :)

Now I think I have a better understanding of your network topology and I 
can try to setup something similar here with some virtual machines and 
look if that raises the problem.