[Lxc-users] how to troubleshoot lxc service

Rintcius Blok rintcius at gmail.com
Mon Nov 19 22:22:54 UTC 2012


Hmm.. I just had the same problem again.
I thought it was this bug: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1021471
(fixed in 3.5.0-17.28)

Am I having the correct version installed? :
3.5.0-18-generic #29-Ubuntu SMP Fri Oct 19 10:26:51 UTC 2012 x86_64 
x86_64 x86_64 GNU/Linux

If I need a higher version, how can I install it? (I did "apt-get 
dist-upgrade")

Rintcius

On 19/11/12 14:37, Serge Hallyn wrote:
> Since you say that after this you cannot start any containers at all until
> a host reboot, I think what you get is a known kernel netdev refcounting
> bug.  (Check your host syslog messages.)  You might try a backported raring
> kernel.
>
> -serge
>
> Quoting Rintcius Blok (rintcius at gmail.com):
>> Hi,
>>
>> I get an unusable lxc service in my host every now and then after
>> creating a new container.
>> Only after a reboot I can get it back to normal.
>>
>> This is basically what I do (12.10 host):
>>
>> lxc-create -t ubuntu-cloud -n c.lxc -- --auth-key $HOME/.ssh/id_rsa.pub
>> --userdata /root/webdocs.txt.gz
>>
>> The problem must occur quite early in the process, since there is no
>> cloudinit log file in the container yet.
>>
>> Also I cannot ssh into the container:
>>
>> $ ssh c.lxc
>> nc: getaddrinfo: Name or service not known
>> ssh_exchange_identification: Connection closed by remote host
>>
>> And these are the only files I see in the container's var/lib/cloud:
>>
>> $ cd /var/lib/lxc/c.lxc/rootfs/var/lib/cloud
>> $ ls -R
>> .:
>> seed
>>
>> ./seed:
>> nocloud-net
>>
>> ./seed/nocloud-net:
>> meta-data  user-data
>>
>> The lxc service gets unusable in the sense that:
>> - I cannot stop the lxc service anymore
>> - containers cannot be started anymore (also simple containers)
>>
>> The lxc processes stay in these state (even though I stopped and
>> destroyed the containers & they do not turn up lxc-list anymore):
>> # ps auxwww | grep lxc
>> 103       1179  0.0  0.0  26032   964 ?        S    11:59 0:00 dnsmasq
>> -u lxc-dnsmasq --strict-order --bind-interfaces
>> --pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address
>> 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253
>> --dhcp-no-override --except-interface=lo --interface=lxcbr0
>> root     12741  0.0  0.0  27540   972 ?        Ds   19:05 0:00 lxc-start
>> -d -n c.lxc
>> root     13159  0.0  0.0  27540   976 ?        Ds   19:28 0:00 lxc-start
>> -d -n e.lxc
>> root     13445  0.0  0.0  27540   972 ?        Ds   19:47 0:00 lxc-start
>> -d -n b.lxc
>>
>> Note the "D" state of the container processes (D =  Uninterruptible
>> sleep (usually IO))
>>
>> I don't have a reproduction path for this yet.
>>
>> Any ideas how to troubleshoot this further, in order to get a
>> reproduction path?
>>
>> Thanks,
>> Rintcius
>>
>> ------------------------------------------------------------------------------
>> Monitor your physical, virtual and cloud infrastructure from a single
>> web console. Get in-depth insight into apps, servers, databases, vmware,
>> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>> Pricing starts from $795 for 25 servers or applications!
>> http://p.sf.net/sfu/zoho_dev2dev_nov
>> _______________________________________________
>> Lxc-users mailing list
>> Lxc-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/lxc-users





More information about the lxc-users mailing list