[Lxc-users] how to troubleshoot lxc service
Serge Hallyn
serge.hallyn at canonical.com
Mon Nov 19 14:37:36 UTC 2012
Since you say that after this you cannot start any containers at all until
a host reboot, I think what you get is a known kernel netdev refcounting
bug. (Check your host syslog messages.) You might try a backported raring
kernel.
-serge
Quoting Rintcius Blok (rintcius at gmail.com):
> Hi,
>
> I get an unusable lxc service in my host every now and then after
> creating a new container.
> Only after a reboot I can get it back to normal.
>
> This is basically what I do (12.10 host):
>
> lxc-create -t ubuntu-cloud -n c.lxc -- --auth-key $HOME/.ssh/id_rsa.pub
> --userdata /root/webdocs.txt.gz
>
> The problem must occur quite early in the process, since there is no
> cloudinit log file in the container yet.
>
> Also I cannot ssh into the container:
>
> $ ssh c.lxc
> nc: getaddrinfo: Name or service not known
> ssh_exchange_identification: Connection closed by remote host
>
> And these are the only files I see in the container's var/lib/cloud:
>
> $ cd /var/lib/lxc/c.lxc/rootfs/var/lib/cloud
> $ ls -R
> .:
> seed
>
> ./seed:
> nocloud-net
>
> ./seed/nocloud-net:
> meta-data user-data
>
> The lxc service gets unusable in the sense that:
> - I cannot stop the lxc service anymore
> - containers cannot be started anymore (also simple containers)
>
> The lxc processes stay in these state (even though I stopped and
> destroyed the containers & they do not turn up lxc-list anymore):
> # ps auxwww | grep lxc
> 103 1179 0.0 0.0 26032 964 ? S 11:59 0:00 dnsmasq
> -u lxc-dnsmasq --strict-order --bind-interfaces
> --pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address
> 10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253
> --dhcp-no-override --except-interface=lo --interface=lxcbr0
> root 12741 0.0 0.0 27540 972 ? Ds 19:05 0:00 lxc-start
> -d -n c.lxc
> root 13159 0.0 0.0 27540 976 ? Ds 19:28 0:00 lxc-start
> -d -n e.lxc
> root 13445 0.0 0.0 27540 972 ? Ds 19:47 0:00 lxc-start
> -d -n b.lxc
>
> Note the "D" state of the container processes (D = Uninterruptible
> sleep (usually IO))
>
> I don't have a reproduction path for this yet.
>
> Any ideas how to troubleshoot this further, in order to get a
> reproduction path?
>
> Thanks,
> Rintcius
>
> ------------------------------------------------------------------------------
> Monitor your physical, virtual and cloud infrastructure from a single
> web console. Get in-depth insight into apps, servers, databases, vmware,
> SAP, cloud infrastructure, etc. Download 30-day Free Trial.
> Pricing starts from $795 for 25 servers or applications!
> http://p.sf.net/sfu/zoho_dev2dev_nov
> _______________________________________________
> Lxc-users mailing list
> Lxc-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/lxc-users
More information about the lxc-users
mailing list