[lxc-users] LXD in EC2
Brian Candler
b.candler at pobox.com
Thu Sep 22 16:24:58 UTC 2016
On 29/07/2016 20:11, Brian Candler wrote:
> I think I have this working by using proxyarp instead of bridging.
>
> On the EC2 VM: leave lxdbr0 unconfigured. Then do:
>
> sysctl net.ipv4.conf.all.forwarding=1
> sysctl net.ipv4.conf.lxdbr0.proxy_arp=1
> ip route add 10.0.0.40 dev lxdbr0
> ip route add 10.0.0.41 dev lxdbr0
> # where 10.0.0.40 and 10.0.0.41 are the IP addresses of the containers
>
> The containers are statically configured with those IP addresses, and
> 10.0.0.1 as gateway.
>
> This is sufficient to allow connectivity between the containers and
> other VMs in the same VPC - yay!
>
> At this point, the containers *don't* have connectivity to the outside
> world. I can see the packets are being sent out with the correct
> source IP address (the container's) and MAC address (the EC2 VM), so I
> presume that the NAT in EC2 is only capable of working with the
> primary IP address - that's reasonable, if it's 1:1 NAT without
> overloading.
>
> So there's also a need for iptables rules to NAT the container's
> address to the EC2 VM's address when talking to the outside world:
>
> iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j ACCEPT
> iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -o eth0 -j MASQUERADE
>
> And hey presto: containers with connectivity, albeit fairly heavily
> frigged.
I now have it working with multiple NICs on the same VM, which lets you
run more containers. It did however turn out to be rather more painful
to set up, and I'm documenting it here in case it's useful for anyone else.
A t2.medium instance can have up to three NICs, and each one can have up
to six IP addresses - one primary and 5 secondary. You can either let
the AWS console pick random ones from your VPC range, or enter your own
choices.
One wrinkle is that when you have two or more NICs, you *must* use an
elastic IP for your first NIC's primary address - it can't map to a
dynamic public IP any more. That's not a big deal (although if you've
reached your limit for elastic IPs you have to raise a ticket with
Amazon to ask for more)
Note that the primary address on any NIC cannot be changed without
deleting and recreating the NIC. So if you want the flexibility to move
an IP address over to a different VM, then you should really set the
first address to something fixed and consider it wasted.
That still lets you run 15 containers on a single t2.medium instance
though - potentially a 15:1 cost saving, if you have enough resources.
Next: eth0, eth1, eth2 are going to sit on the same IP subnet, but the
default route is via eth0. So you want to assign a higher "metric" to
eth1/eth2, so the default route will always use eth0. On Ubuntu:
"apt-get install ifmetric" and then configure your interfaces like this:
auto eth0
iface eth0 inet dhcp
auto eth1
iface eth1 inet static
address 10.0.0.211/24 # eth1's primary address
metric 100
... ditto for eth2 if using it
(I leave eth0 on dhcp because this makes it less likely that I can lock
myself out by bad configuration)
Now, the problem comes with the robust way that EC2 does traffic
security. You have created three NICs with 6 addresses each; but EC2
only accepts a packet coming out of a particular virtual NIC if it
leaves with a source IP address which is one of the addresses assigned
to that particular NIC. Also, the source MAC address must be the MAC
address of that particular NIC.
These packets are going to originate from containers inside your VM, and
each container doesn't know or care which interface their traffic will
be forward through.
Fixing this requiressource routing
<https://groups.google.com/forum/#%21topic/ganeti/qVMZFbH1X54>rules. In
the following example, there are six containers using secondary
addresses on this VM:
* eth0 primary address 10.0.0.201, secondary addresses
10.0.0.{21,23,53,49,89}
* eth1 primary address 10.0.0.211, secondary address 10.0.0.14
(eth2 is not being used)
Run the following command to create a routing table:
echo "200 force-eth1" >>/etc/iproute2/rt_tables
Add to /etc/rc.local:
# Policy routing to force traffic with eth1 source address out of eth1
for addr in 10.0.0.14; do
ip rule add from "$addr" table force-eth1
done
# Internet traffic, which is masqueraded, goes via eth0
ip route add default via 10.0.0.1 dev eth0 metric 100 table force-eth1
# Non-masqueraded traffic goes via eth1
ip route add 10.0.0.0/24 dev eth1 proto kernel scope link table force-eth1
ip route add 10.0.0.0/8 via 10.0.0.1 dev eth1 metric 100 table force-eth1
ip route flush cache
Check:
# ip rule list
0: from all lookup local
32765: from 10.0.0.14 lookup force-eth1
32766: from all lookup main
32767: from all lookup default
# ip route list table main
default via 10.0.0.1 dev eth0
10.0.0.0/24 dev eth0 proto kernel scope link src 10.0.0.201
10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.211 metric 100
10.0.0.14 dev lxdbr0 scope link
10.0.0.21 dev lxdbr0 scope link
10.0.0.23 dev lxdbr0 scope link
10.0.0.49 dev lxdbr0 scope link
10.0.0.53 dev lxdbr0 scope link
10.0.0.89 dev lxdbr0 scope link
# ip route list table force-eth1
default via 10.0.0.1 dev eth0 metric 100
10.0.0.0/24 dev eth1 proto kernel scope link
10.0.0.0/8 via 10.0.0.1 dev eth1 metric 100
Finally, add to /etc/rc.local the static routes and proxy ARP required
for LXD container networking.
sysctl net.ipv4.conf.all.forwarding=1
sysctl net.ipv4.conf.lxdbr0.proxy_arp=1
for addr in 10.0.0.21 10.0.0.23 10.0.0.49 10.0.0.89 10.0.0.53 10.0.0.14; do
ip route add "$addr" dev lxdbr0
done
# Masquerading for containers (except for our primary IP address)
iptables -t nat -A POSTROUTING -s 10.0.0.201 -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -d 10.0.0.0/8 -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.0.0.0/8 -j MASQUERADE
What these rules achieve are:
* traffic from a container which is going to another private address
(e.g. another VM in the same subnet, or another subnet in your VPC) will
retain its original source address. The source routing ensures it is
sent out via eth0, eth1 or eth2 depending on which source address is
being used.
* traffic from the container which is going out to the public Internet
will be NAT'd to the eth0 primary address and will leave via eth0, so
that it can in turn be NAT'd to the elastic public IP the EC2 VM.
Then you reboot and cross your fingers that you haven't locked yourself
out. I did this quite a few times until I got it right :-)
There's no console on EC2 VMs, so if it's broken, either you have to
detach your EBS volume and attach it to another temporary instance, or
you just blow the instance away and start again.
Cheers,
Brian Candler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20160922/ab3d70a6/attachment.html>
More information about the lxc-users
mailing list