[lxc-users] LXD in EC2

Thu Sep 22 16:24:58 UTC 2016

On 29/07/2016 20:11, Brian Candler wrote:
> I think I have this working by using proxyarp instead of bridging.
>
> On the EC2 VM: leave lxdbr0 unconfigured. Then do:
>
> sysctl net.ipv4.conf.all.forwarding=1
> sysctl net.ipv4.conf.lxdbr0.proxy_arp=1
> ip route add 10.0.0.40 dev lxdbr0
> ip route add 10.0.0.41 dev lxdbr0
> # where 10.0.0.40 and 10.0.0.41 are the IP addresses of the containers
>
> The containers are statically configured with those IP addresses, and 
> 10.0.0.1 as gateway.
>
> This is sufficient to allow connectivity between the containers and 
> other VMs in the same VPC - yay!
>
> At this point, the containers *don't* have connectivity to the outside 
> world. I can see the packets are being sent out with the correct 
> source IP address (the container's) and MAC address (the EC2 VM), so I 
> presume that the NAT in EC2 is only capable of working with the 
> primary IP address - that's reasonable, if it's 1:1 NAT without 
> overloading.
>
> So there's also a need for iptables rules to NAT the container's 
> address to the EC2 VM's address when talking to the outside world:
>
> iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j ACCEPT
> iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -o eth0 -j MASQUERADE
>
> And hey presto: containers with connectivity, albeit fairly heavily 
> frigged. 

I now have it working with multiple NICs on the same VM, which lets you 
run more containers. It did however turn out to be rather more painful 
to set up, and I'm documenting it here in case it's useful for anyone else.

A t2.medium instance can have up to three NICs, and each one can have up 
to six IP addresses - one primary and 5 secondary. You can either let 
the AWS console pick random ones from your VPC range, or enter your own 
choices.

One wrinkle is that when you have two or more NICs, you *must* use an 
elastic IP for your first NIC's primary address - it can't map to a 
dynamic public IP any more. That's not a big deal (although if you've 
reached your limit for elastic IPs you have to raise a ticket with 
Amazon to ask for more)

Note that the primary address on any NIC cannot be changed without 
deleting and recreating the NIC. So if you want the flexibility to move 
an IP address over to a different VM, then you should really set the 
first address to something fixed and consider it wasted.

That still lets you run 15 containers on a single t2.medium instance 
though - potentially a 15:1 cost saving, if you have enough resources.

Next: eth0, eth1, eth2 are going to sit on the same IP subnet, but the 
default route is via eth0. So you want to assign a higher "metric" to 
eth1/eth2, so the default route will always use eth0. On Ubuntu: 
"apt-get install ifmetric" and then configure your interfaces like this:

auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet static
address 10.0.0.211/24  # eth1's primary address
metric 100

... ditto for eth2 if using it

(I leave eth0 on dhcp because this makes it less likely that I can lock 
myself out by bad configuration)

Now, the problem comes with the robust way that EC2 does traffic 
security. You have created three NICs with 6 addresses each; but EC2 
only accepts a packet coming out of a particular virtual NIC if it 
leaves with a source IP address which is one of the addresses assigned 
to that particular NIC. Also, the source MAC address must be the MAC 
address of that particular NIC.

These packets are going to originate from containers inside your VM, and 
each container doesn't know or care which interface their traffic will 
be forward through.

Fixing this requiressource routing 
<https://groups.google.com/forum/#%21topic/ganeti/qVMZFbH1X54>rules. In 
the following example, there are six containers using secondary 
addresses on this VM:

* eth0 primary address 10.0.0.201, secondary addresses 
10.0.0.{21,23,53,49,89}

* eth1 primary address 10.0.0.211, secondary address 10.0.0.14

(eth2 is not being used)

Run the following command to create a routing table:

    echo "200 force-eth1" >>/etc/iproute2/rt_tables

Add to /etc/rc.local:

# Policy routing to force traffic with eth1 source address out of eth1
for addr in 10.0.0.14; do
   ip rule add from "$addr" table force-eth1
done

# Internet traffic, which is masqueraded, goes via eth0
ip route add default via 10.0.0.1 dev eth0 metric 100 table force-eth1
# Non-masqueraded traffic goes via eth1
ip route add 10.0.0.0/24 dev eth1  proto kernel  scope link  table force-eth1
ip route add 10.0.0.0/8 via 10.0.0.1 dev eth1 metric 100 table force-eth1
ip route flush cache

Check:

# ip rule list
0:     	from all lookup local
32765: 	from 10.0.0.14 lookup force-eth1
32766: 	from all lookup main
32767: 	from all lookup default

# ip route list table main
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0  proto kernel  scope link  src 10.0.0.201
10.0.0.0/24 dev eth1  proto kernel  scope link  src 10.0.0.211  metric 100
10.0.0.14 dev lxdbr0  scope link
10.0.0.21 dev lxdbr0  scope link
10.0.0.23 dev lxdbr0  scope link
10.0.0.49 dev lxdbr0  scope link
10.0.0.53 dev lxdbr0  scope link
10.0.0.89 dev lxdbr0  scope link

# ip route list table force-eth1
default via 10.0.0.1 dev eth0  metric 100
10.0.0.0/24 dev eth1  proto kernel  scope link
10.0.0.0/8 via 10.0.0.1 dev eth1  metric 100

Finally, add to /etc/rc.local the static routes and proxy ARP required 
for LXD container networking.

sysctl net.ipv4.conf.all.forwarding=1
sysctl net.ipv4.conf.lxdbr0.proxy_arp=1
for addr in 10.0.0.21 10.0.0.23 10.0.0.49 10.0.0.89 10.0.0.53 10.0.0.14; do
   ip route add "$addr" dev lxdbr0
done

# Masquerading for containers (except for our primary IP address)
iptables -t nat -A POSTROUTING -s 10.0.0.201 -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -j MASQUERADE

What these rules achieve are:

* traffic from a container which is going to another private address 
(e.g. another VM in the same subnet, or another subnet in your VPC) will 
retain its original source address. The source routing ensures it is 
sent out via eth0, eth1 or eth2 depending on which source address is 
being used.

* traffic from the container which is going out to the public Internet 
will be NAT'd to the eth0 primary address and will leave via eth0, so 
that it can in turn be NAT'd to the elastic public IP the EC2 VM.

Then you reboot and cross your fingers that you haven't locked yourself 
out. I did this quite a few times until I got it right :-)

There's no console on EC2 VMs, so if it's broken, either you have to 
detach your EBS volume and attach it to another temporary instance, or 
you just blow the instance away and start again.

Cheers,

Brian Candler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20160922/ab3d70a6/attachment.html>