[Lxc-users] how to troubleshoot lxc service
Rintcius Blok
rintcius at gmail.com
Mon Nov 19 22:22:54 UTC 2012
Hmm.. I just had the same problem again.
I thought it was this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1021471
(fixed in 3.5.0-17.28)
Am I having the correct version installed? :
3.5.0-18-generic #29-Ubuntu SMP Fri Oct 19 10:26:51 UTC 2012 x86_64
x86_64 x86_64 GNU/Linux
If I need a higher version, how can I install it? (I did "apt-get
dist-upgrade")
Rintcius
On 19/11/12 14:37, Serge Hallyn wrote:
> Since you say that after this you cannot start any containers at all until
> a host reboot, I think what you get is a known kernel netdev refcounting
> bug. (Check your host syslog messages.) You might try a backported raring
> kernel.
>
> -serge
>
> Quoting Rintcius Blok (rintcius at gmail.com):
>> Hi,
>>
>> I get an unusable lxc service in my host every now and then after
>> creating a new container.
>> Only after a reboot I can get it back to normal.
>>
>> This is basically what I do (12.10 host):
>>
>> lxc-create -t ubuntu-cloud -n c.lxc -- --auth-key $HOME/.ssh/id_rsa.pub
>> --userdata /root/webdocs.txt.gz
>>
>> The problem must occur quite early in the process, since there is no
>> cloudinit log file in the container yet.
>>
>> Also I cannot ssh into the container:
>>
>> $ ssh c.lxc
>> nc: getaddrinfo: Name or service not known
>> ssh_exchange_identification: Connection closed by remote host
>>
>> And these are the only files I see in the container's var/lib/cloud:
>>
>> $ cd /var/lib/lxc/c.lxc/rootfs/var/lib/cloud
>> $ ls -R
>> .:
>> seed
>>
>> ./seed:
>> nocloud-net
>>
>> ./seed/nocloud-net:
>> meta-data user-data
>>
>> The lxc service gets unusable in the sense that:
>> - I cannot stop the lxc service anymore
>> - containers cannot be started anymore (also simple containers)
>>
>> The lxc processes stay in these state (even though I stopped and
>> destroyed the containers & they do not turn up lxc-list anymore):
>> # ps auxwww | grep lxc
>> 103 1179 0.0 0.0 26032 964 ? S 11:59 0:00 dnsmasq
>> -u lxc-dnsmasq --strict-order --bind-interfaces
>> --pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address
>> 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253
>> --dhcp-no-override --except-interface=lo --interface=lxcbr0
>> root 12741 0.0 0.0 27540 972 ? Ds 19:05 0:00 lxc-start
>> -d -n c.lxc
>> root 13159 0.0 0.0 27540 976 ? Ds 19:28 0:00 lxc-start
>> -d -n e.lxc
>> root 13445 0.0 0.0 27540 972 ? Ds 19:47 0:00 lxc-start
>> -d -n b.lxc
>>
>> Note the "D" state of the container processes (D = Uninterruptible
>> sleep (usually IO))
>>
>> I don't have a reproduction path for this yet.
>>
>> Any ideas how to troubleshoot this further, in order to get a
>> reproduction path?
>>
>> Thanks,
>> Rintcius
>>
>> ------------------------------------------------------------------------------
>> Monitor your physical, virtual and cloud infrastructure from a single
>> web console. Get in-depth insight into apps, servers, databases, vmware,
>> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
>> Pricing starts from $795 for 25 servers or applications!
>> http://p.sf.net/sfu/zoho_dev2dev_nov
>> _______________________________________________
>> Lxc-users mailing list
>> Lxc-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/lxc-users
More information about the lxc-users
mailing list