[Lxc-users] how to troubleshoot lxc service

Serge Hallyn serge.hallyn at canonical.com
Mon Nov 19 14:37:36 UTC 2012


Since you say that after this you cannot start any containers at all until
a host reboot, I think what you get is a known kernel netdev refcounting
bug.  (Check your host syslog messages.)  You might try a backported raring
kernel.

-serge

Quoting Rintcius Blok (rintcius at gmail.com):
> Hi,
> 
> I get an unusable lxc service in my host every now and then after 
> creating a new container.
> Only after a reboot I can get it back to normal.
> 
> This is basically what I do (12.10 host):
> 
> lxc-create -t ubuntu-cloud -n c.lxc -- --auth-key $HOME/.ssh/id_rsa.pub 
> --userdata /root/webdocs.txt.gz
> 
> The problem must occur quite early in the process, since there is no 
> cloudinit log file in the container yet.
> 
> Also I cannot ssh into the container:
> 
> $ ssh c.lxc
> nc: getaddrinfo: Name or service not known
> ssh_exchange_identification: Connection closed by remote host
> 
> And these are the only files I see in the container's var/lib/cloud:
> 
> $ cd /var/lib/lxc/c.lxc/rootfs/var/lib/cloud
> $ ls -R
> .:
> seed
> 
> ./seed:
> nocloud-net
> 
> ./seed/nocloud-net:
> meta-data  user-data
> 
> The lxc service gets unusable in the sense that:
> - I cannot stop the lxc service anymore
> - containers cannot be started anymore (also simple containers)
> 
> The lxc processes stay in these state (even though I stopped and 
> destroyed the containers & they do not turn up lxc-list anymore):
> # ps auxwww | grep lxc
> 103       1179  0.0  0.0  26032   964 ?        S    11:59 0:00 dnsmasq 
> -u lxc-dnsmasq --strict-order --bind-interfaces 
> --pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address 
> 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253 
> --dhcp-no-override --except-interface=lo --interface=lxcbr0
> root     12741  0.0  0.0  27540   972 ?        Ds   19:05 0:00 lxc-start 
> -d -n c.lxc
> root     13159  0.0  0.0  27540   976 ?        Ds   19:28 0:00 lxc-start 
> -d -n e.lxc
> root     13445  0.0  0.0  27540   972 ?        Ds   19:47 0:00 lxc-start 
> -d -n b.lxc
> 
> Note the "D" state of the container processes (D =  Uninterruptible 
> sleep (usually IO))
> 
> I don't have a reproduction path for this yet.
> 
> Any ideas how to troubleshoot this further, in order to get a 
> reproduction path?
> 
> Thanks,
> Rintcius
> 
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Lxc-users mailing list
> Lxc-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-users




More information about the lxc-users mailing list