[Lxc-users] how to troubleshoot lxc service
Rintcius Blok
rintcius at gmail.com
Fri Nov 16 22:56:58 UTC 2012
Hi,
I get an unusable lxc service in my host every now and then after
creating a new container.
Only after a reboot I can get it back to normal.
This is basically what I do (12.10 host):
lxc-create -t ubuntu-cloud -n c.lxc -- --auth-key $HOME/.ssh/id_rsa.pub
--userdata /root/webdocs.txt.gz
The problem must occur quite early in the process, since there is no
cloudinit log file in the container yet.
Also I cannot ssh into the container:
$ ssh c.lxc
nc: getaddrinfo: Name or service not known
ssh_exchange_identification: Connection closed by remote host
And these are the only files I see in the container's var/lib/cloud:
$ cd /var/lib/lxc/c.lxc/rootfs/var/lib/cloud
$ ls -R
.:
seed
./seed:
nocloud-net
./seed/nocloud-net:
meta-data user-data
The lxc service gets unusable in the sense that:
- I cannot stop the lxc service anymore
- containers cannot be started anymore (also simple containers)
The lxc processes stay in these state (even though I stopped and
destroyed the containers & they do not turn up lxc-list anymore):
# ps auxwww | grep lxc
103 1179 0.0 0.0 26032 964 ? S 11:59 0:00 dnsmasq
-u lxc-dnsmasq --strict-order --bind-interfaces
--pid-file=/var/run/lxc/dnsmasq.pid --conf-file= --listen-address
10.0.3.1 --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-max=253
--dhcp-no-override --except-interface=lo --interface=lxcbr0
root 12741 0.0 0.0 27540 972 ? Ds 19:05 0:00 lxc-start
-d -n c.lxc
root 13159 0.0 0.0 27540 976 ? Ds 19:28 0:00 lxc-start
-d -n e.lxc
root 13445 0.0 0.0 27540 972 ? Ds 19:47 0:00 lxc-start
-d -n b.lxc
Note the "D" state of the container processes (D = Uninterruptible
sleep (usually IO))
I don't have a reproduction path for this yet.
Any ideas how to troubleshoot this further, in order to get a
reproduction path?
Thanks,
Rintcius
More information about the lxc-users
mailing list