<div dir="ltr">One additional note:<div><br></div><div>Make sure the btrfs volume is a fast disk. I just tried with an AWS EBS volume and was unable reproduce the problem. As soon as I switched to using an ephemeral (local storage) disk, I was able to reproduce after only 2 runs of the test script.
<div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Aug 14, 2013 at 1:22 PM, Jay Taylor <span dir="ltr"><<a href="mailto:jay@jaytaylor.com" target="_blank">jay@jaytaylor.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Hi Serge,<div><br></div><div>I added zfs support to the application and systems creating/hosting the containers, and I have subsequently been unable to reproduce any issues.</div><div><br></div><div>As far as trying to reproduce it with btrfs, I've had some success.</div>
<div><br></div><div>The general system state is something like:</div><div>N containers already running happily</div><div>Launch N+ more containers in rapid succession (in parallell, not serially).</div><div><br></div><div>
I've modified your test script to reflect more closely what my application is actually doing, by slowly launching 10 containers, and then using "&" to rapidly fork and additional 10 clone/start operations. I have it doing 2 cycles of this and it eventually triggers the problem (it's taken up to 3 runs for to trigger the problem).</div>
<div><br></div><div>And for reference, here is an exact copy the scripts I used to reproduce the problem:</div><div><br></div><div>test.sh:</div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div>#!/usr/bin/env bash</div>
<div><br></div><div>prefix=$1</div><div><br></div><div>test -z "${prefix}" && echo 'error: missing required parameter: prefix' 1>&2 && exit 1</div><div><br></div><div>path=/mnt</div>
<div><br></div><div>sudo lxc-destroy -n c1 2>/dev/null</div><div>sudo lxc-create -t ubuntu -B btrfs -n c1</div><div><div><br></div><div>for i in `seq 1 10`; do</div></div><div> sudo lxc-clone -s -B btrfs -P $path -o c1 -n $prefix$i</div>
<div> sudo lxc-start -d -n $prefix$i</div><div>done</div><div>for i in `seq 11 20`; do</div><div> echo $(sudo lxc-clone -s -B btrfs -P $path -o c1 -n $prefix$i; sudo lxc-start -d -n $prefix$i) &</div><div>done</div>
<div><br></div><div>sleep 10</div><div><br></div><div># Create even more.</div><div>for i in `seq 21 30`; do</div><div> sudo lxc-clone -s -B btrfs -P $path -o c1 -n $prefix$i</div><div> sudo lxc-start -d -n $prefix$i</div>
<div>done</div><div>for i in `seq 31 40`; do</div><div> echo $(sudo lxc-clone -s -B btrfs -P $path -o c1 -n $prefix$i; sudo lxc-start -d -n $prefix$i) &</div><div>done</div></blockquote><div><div class="gmail_extra">
<br></div><div class="gmail_extra">stop.sh:</div></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div class="gmail_extra"><div class="gmail_extra">#!/usr/bin/env bash</div><div class="gmail_extra">
<br></div><div class="gmail_extra">prefix=$1</div><div class="gmail_extra"><br></div><div class="gmail_extra">test -z "${prefix}" && echo 'error: missing required parameter: prefix' 1>&2 && exit 1</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">sudo lxc-destroy -n c1;</div><div class="gmail_extra"><br></div><div class="gmail_extra">for i in `seq 1 40`; do</div><div class="gmail_extra"> echo $(sudo lxc-stop -k -n $prefix$i; sudo lxc-destroy -n $prefix$i) &</div>
<div class="gmail_extra">done</div></div></div></blockquote><div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">bash ./test.sh x<br></div><div class="gmail_extra">bash ./test.sh y<br>
</div><div class="gmail_extra">bash ./test.sh z<br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><br></div><div class="gmail_extra">If it doesn't manifest at first, try stopping/starting varying quantities of containers for several cycles. Eventually I consistently end up not ever getting ip addresses:</div>
<div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">x1 RUNNING - - NO</div><div class="gmail_extra">x10 RUNNING - - NO</div>
<div class="gmail_extra">x11 RUNNING - - NO</div><div class="gmail_extra">x12 RUNNING - - NO</div><div class="gmail_extra">x13 RUNNING - - NO</div>
<div class="gmail_extra">x14 RUNNING - - NO</div><div class="gmail_extra">x15 RUNNING - - NO</div><div class="gmail_extra">x16 RUNNING - - NO</div>
<div class="gmail_extra">x17 RUNNING - - NO</div><div class="gmail_extra">x18 RUNNING - - NO</div><div class="gmail_extra">x19 RUNNING - - NO</div>
<div class="gmail_extra">x2 RUNNING - - NO</div><div class="gmail_extra">x20 RUNNING - - NO</div><div class="gmail_extra">x21 RUNNING - - NO</div>
<div class="gmail_extra">x22 RUNNING - - NO</div><div class="gmail_extra">x23 RUNNING - - NO</div><div class="gmail_extra">x24 RUNNING - - NO</div>
<div class="gmail_extra">x25 RUNNING - - NO</div><div class="gmail_extra">x26 RUNNING - - NO</div><div class="gmail_extra">x27 RUNNING - - NO</div>
<div class="gmail_extra">x28 RUNNING - - NO</div><div class="gmail_extra">x29 RUNNING - - NO</div><div class="gmail_extra">x3 RUNNING - - NO</div>
<div class="gmail_extra">x30 RUNNING - - NO</div><div class="gmail_extra">x31 RUNNING - - NO</div><div class="gmail_extra">x32 RUNNING - - NO</div>
<div class="gmail_extra">x33 RUNNING - - NO</div><div class="gmail_extra">x34 RUNNING - - NO</div><div class="gmail_extra">x35 RUNNING - - NO</div>
<div class="gmail_extra">x36 RUNNING - - NO</div><div class="gmail_extra">x37 RUNNING - - NO</div><div class="gmail_extra">x38 RUNNING - - NO</div>
<div class="gmail_extra">x39 RUNNING - - NO</div><div class="gmail_extra">x4 RUNNING - - NO</div><div class="gmail_extra">x40 RUNNING - - NO</div>
<div class="gmail_extra">x5 RUNNING - - NO</div><div class="gmail_extra">x6 RUNNING - - NO</div><div class="gmail_extra">x7 RUNNING - - NO</div>
<div class="gmail_extra">x8 RUNNING - - NO</div><div class="gmail_extra">x9 RUNNING - - NO</div></div><div><div><div class="gmail_extra"><br>
<br><div class="gmail_quote">
On Wed, Aug 14, 2013 at 10:12 AM, Serge Hallyn <span dir="ltr"><<a href="mailto:serge.hallyn@ubuntu.com" target="_blank">serge.hallyn@ubuntu.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>Quoting Serge Hallyn (<a href="mailto:serge.hallyn@ubuntu.com" target="_blank">serge.hallyn@ubuntu.com</a>):<br>
> Quoting Jay Taylor (<a href="mailto:jay@jaytaylor.com" target="_blank">jay@jaytaylor.com</a>):<br>
> > After further investigation yesterday, I am not convinced it is an<br>
> > IP-address issue. The affected host machines are unable to start any<br>
> > existing or newly created containers. The incident that triggered the<br>
> > issue was cloning 1 container into 10 new ones, and then launching them all<br>
> > simultaneously. Are there any known concurrency issues with LXC which<br>
> > would explain why executing a lot of clone/start LXC commands at the same<br>
><br>
> Known, no, but that doesn't mean they're not there :)<br>
><br>
> However, could you try to reproduce this with non-btrfs?<br>
><br>
> I'll try to reproduce with btrfs...<br>
<br>
</div>In a fresh raring instance I mounted a btrfs disk on /mnt, and did<br>
<br>
lxc-create -t ubuntu -B btrfs -P /mnt -n c1<br>
for i in `seq 1 10`; do<br>
lxc-clone -s -p /mnt -o c1 -n x$i<br>
done<br>
for i in `seq 1 10`; do<br>
lxc-start -d -P /mnt -n x$i<br>
done<br>
<br>
Then connected to two of the containers with lxc-console,<br>
lxc-console -P /mnt -n x2<br>
lxc-console -P /mnt -n x9<br>
<br>
both were up and had unique ip addresses.<br>
<br>
Again this was a raring instance with ppa:ubuntu-lxc/daily installed.<br>
<span><font color="#888888"><br>
-serge<br>
</font></span></blockquote></div><br></div></div></div></div></div>
</blockquote></div><br></div></div></div>