<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body text="#000000" bgcolor="#ffffff">

        Serge,<br>

        Thank you for looking at this.<br>

    <br>

    Serge> <i>However, I actually don't think it should happen the

      way you describe.</i>

    <br>

    <br>

        I believe you have mis-read my description.  I think we are

    actually in agreement with what is happening.<br>

    <br>

        You said:<br>

    <br>

    Serge> <i>So the mac address of the veth endpoint in the

      container should not matter.</i>

    <br>

    <br>

        I think that is the same thing that I said:<br>

    <br>

    Derek> <tt>[The problem MAC address] is NOT the mac address

      specified in lxc.conf, like this:<br>

    </tt>

    <pre wrap=""><tt>

lxc.network.hwaddr = fe:16:3e:fd:5a:5b

        That MAC address has nothing to do with the bug; the host's bridge

device (br0) will never assume a configured LXC MAC address as its own.</tt></pre>

    <br>

        Also, you said:<br>

    <br>

    Serge> <i>The other endpoint, the veth which stays in the host's

      network namespace, that is the one which gets placed on the

      bridge.</i>

    <br>

    <br>

        I agree, that is the address which causes the ~4 network second

    freeze.  As I said in my original description:<br>

    <br>

    Derek>> <tt>...the MAC address in question is the one of the

      virtual vethXXXX device, as shown with "ifconfig" on the host:<br>

    </tt>

    <pre wrap=""><tt>

veth0IEDlk Link encap:Ethernet  HWaddr 4e:34:7c:dc:92:e8

[...snip...]</tt></pre>

    <br>

        So, are we in agreement that the problem address is NOT the one

    in the LXC .conf file (as specified by the user), but instead is the

    "random" address of the veth device on the host?<br>

    <br>

    <br>

    Serge> <i>Hmm, I haven't seen this happen at all.</i>

    <br>

    <br>

        I have seen it on Ubuntu 10.04, and there was an independent

    description of the same symptom (and a different but very similar

    work-around) filed in SourceForge here:<br>

    <br>

<a class="moz-txt-link-freetext" href="http://sourceforge.net/tracker/index.php?func=detail&aid=3411497&group_id=163076&atid=826303">http://sourceforge.net/tracker/index.php?func=detail&aid=3411497&group_id=163076&atid=826303</a><br>

    <br>

        (That's SF bug ID# 3411497.)<br>

    <br>

        As described in the libvirt bugfix for this issue (linked

    below), the reason some people see it and some people don't is that

    it only happens when the veth MAC address is lower than that of the

    physical eth0 device's MAC address.  (That is how the Linux kernel

    handles it, by design.  I don't know why.)<br>

    <br>

        Since the MAC address is randomly chosen, it is a random symptom

    that will vary from one NIC to another.  Those who happen to have a

    high MAC address for eth0 will see it more frequently (but still

    randomly.)  This is a major impact on production symptoms, where a

    ~4 second network freeze could trigger admin alerts and/or failover

    scripts.  (Note the exact duration of the network freeze also

    depends on your switches and routers, and how they handle ARP

    caching.)<br>

    <br>

    <br>

    Thank You,<br>

    Derek Simkowiak<br>

    <br>

    <br>

    On 10/24/2011 11:40 AM, Serge E. Hallyn wrote:

    <blockquote cite="mid:20111024184037.GA14835@sergelap" type="cite">

      <pre wrap="">Quoting Derek Simkowiak (<a class="moz-txt-link-abbreviated" href="mailto:derek@simkowiak.net">derek@simkowiak.net</a>):

</pre>

      <blockquote type="cite">

        <pre wrap="">     Hello,

     Just following up re: this bug.  I think it's a pretty serious issue.

     I am looking to work on this, but I am seeking some feedback and 

direction from one of the core LXC devs.

- Do you agree with my analysis?

- Has anyone else worked on this already?

</pre>

      </blockquote>

      <pre wrap="">

Hmm, I haven't seen this happen at all.  That doesn't mean it's not

possible.

However, I actually don't think it should happen the way you describe.

Note that the veth passed in to the container is *not* assigned to the

bridge.  The other endpoint, the veth which stays in the host's network

namespace, that is the one which gets placed on the bridge.  So the

mac address of the veth endpoint in the container should not matter.

(Disclaimer: my being wrong is a not-infrequent event)

-serge

</pre>

      <blockquote type="cite">

        <pre wrap="">etc.

Thanks,

Derek

On 10/18/2011 04:31 PM, Derek Simkowiak wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">      There is a behavior in the Linux kernel which can cause a bridge

device to change MAC address, thus causing a network blackout of several

seconds (while everybody ARPs the new MAC address flushes the old one).

This happens when bridging an enslaved interface, like we do with LXC.

      The symptom is that the LXC host will black out for several seconds

when starting or stopping an LXC container.  Your SSH terminal on the

host will freeze and become unresponsive.  (It is a random symptom,

because the blackout only happens if the randomly-assigned MAC address

of the virtual device is lower than that of the physical eth0 device).

      This behavior was first observed by the libvirt folks when creating

virtual machines.  You can read more details about it (and how they

fixed it) here:

<a class="moz-txt-link-freetext" href="https://www.redhat.com/archives/libvir-list/2010-July/msg00450.html">https://www.redhat.com/archives/libvir-list/2010-July/msg00450.html</a>

<a class="moz-txt-link-freetext" href="https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/584048">https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/584048</a>

      I have observed the symptom under LXC, and the workaround for it

has been independently confirmed for LXC in this bug report (ID: 3411497):

<a class="moz-txt-link-freetext" href="http://sourceforge.net/tracker/index.php?func=detail&aid=3411497&group_id=163076&atid=826303">http://sourceforge.net/tracker/index.php?func=detail&aid=3411497&group_id=163076&atid=826303</a>

      The workaround for the bug is to give the virtual device a high MAC

address, thus discouraging the bridge device from adapting its MAC

address as its own.

      I have mentioned this bug on the list before, however, I was

confused about which MAC address was causing the problem.  This is NOT

the mac address specified in lxc.conf, like this:

lxc.network.hwaddr = fe:16:3e:fd:5a:5b

      That MAC address has nothing to do with the bug; the host's bridge

device (br0) will never assume a configured LXC MAC address as its own.

Instead, the MAC address in question is the one of the virtual vethXXXX

device, as shown with "ifconfig" on the host:

veth0IEDlk Link encap:Ethernet  HWaddr 4e:34:7c:dc:92:e8

[...snip...]

      That HWaddr should be given a high prefix to avoid the network

blackouts, just like they've done for libvirt.  That does not exist in

any config file anywhere; it must be fixed in the LXC source code.

      I looked in network.c for the LXC source code and I think the fix

should go in lxc_bridge_attach() near line 991.  The fix would put a

manually-generated MAC address -- one with a high prefix -- into

ifr.ifr_hwaddr.sa_data and thus replace the random one assigned by the

kernel.

      However, I'm new to the LXC source and would like some input and

analysis from a more seasoned contributor.  I would be happy to test and

maybe even contribute a patch, but I'd like some feedback first.

Thank You,

Derek Simkowiak

------------------------------------------------------------------------------

All the data continuously generated in your IT infrastructure contains a

definitive record of customers, application performance, security

threats, fraudulent activity and more. Splunk takes this data and makes

sense of it. Business sense. IT sense. Common sense.

<a class="moz-txt-link-freetext" href="http://p.sf.net/sfu/splunk-d2d-oct">http://p.sf.net/sfu/splunk-d2d-oct</a>

_______________________________________________

Lxc-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lxc-users@lists.sourceforge.net">Lxc-users@lists.sourceforge.net</a>

<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/lxc-users">https://lists.sourceforge.net/lists/listinfo/lxc-users</a>

</pre>

        </blockquote>

        <pre wrap="">

------------------------------------------------------------------------------

The demand for IT networking professionals continues to grow, and the

demand for specialized networking skills is growing even more rapidly.

Take a complimentary Learning@Cisco Self-Assessment and learn 

about Cisco certifications, training, and career opportunities. 

<a class="moz-txt-link-freetext" href="http://p.sf.net/sfu/cisco-dev2dev">http://p.sf.net/sfu/cisco-dev2dev</a>

_______________________________________________

Lxc-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Lxc-users@lists.sourceforge.net">Lxc-users@lists.sourceforge.net</a>

<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/lxc-users">https://lists.sourceforge.net/lists/listinfo/lxc-users</a>

</pre>

      </blockquote>

    </blockquote>

    <br>

  </body>

</html>