[lxc-users] Getting kdump to work on an LXC server
Rod Bruce
rod.bruce at mnsu.edu
Thu Sep 11 19:20:51 UTC 2014
Greetings,
I have been working on a problem the last couple of days and I believe I
have come up with a solution so I thought I would share it with the list
in case anybody else runs into this or someone has a better solution.
Problem:
I have had a server running Ubuntu 14.04 hang a couple of times. I try
to run everything using standard Ubuntu packages. The server is an LXC
host with two containers running on it (but several more planned). I
wanted to get a kernel core dump if it hung again so I started
investigating kdump/kexec. I installed, configured, and tested
kdump/kexec on another server and it worked as advertised. However, when
I tried it on the LXC server it would save the core dump OK but the
server would fail to reboot or hang at some other point in the process.
I noticed that when kexec was booting the secondary kernel it was
starting up all of the services that start on a normal boot, including
LXC, and that seemed to be causing a problem. When I set the containers
to not auto boot, kdump worked as expected. However, we want the
containers to auto boot so I had to come up with a different solution.
Things I tried that did not work:
- I added the parameter KDUMP_RUNLEVEL="1" to the
/etc/default/kdump-tools file. KDUMP_RUNLEVEL="1" is something I found
mentioned on a couple of pages but it is not in any of the man pages or
Ubuntu documentation.
- I uncommented the KDUMP_CMDLINE_APPEND parameter in the
/etc/default/kdump-tools file and changed the line to
KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb 1" which would tell kexec
to boot into single-user mode. This did boot to single-user mode,
however single-user mode is not adequate because it asks for a root
password (for which there is a work-around) but it also does not mount
extra file systems (like /var/crash).
The solution I came up with:
I changed the default run-level from 2 to 3, set LXC to not start on
run-level 2, and configure kdump to boot to run-level 2. Historically,
run-level 2 was multi-user mode without networking and run-level 3 was
the same as 2 but with network support enabled. As far as I can tell, at
least with a standard Ubuntu 14.04 server install there is no difference
between run-levels 2 and 3.
Here are the details:
1. Change the default run-level from 2 to 3:
sudo sed -i "s/^env DEFAULT_RUNLEVEL=2/env DEFAULT_RUNLEVEL=3/"
/etc/init/rc-sysinit.conf
2. Set LXC to not start on run-level 2:
sudo sed -i "s/^start on runlevel \[2345\]/start on runlevel \[345\]/"
/etc/init/lxc.conf
sudo sed -i "s/^stop on starting rc RUNLEVEL=\[016\]/stop on starting rc
RUNLEVEL=\[0126\]/" /etc/init/lxc.conf
3. Configure kdump to boot to run-level 2:
sudo sed -i "s/^#KDUMP_CMDLINE_APPEND=\"irqpoll maxcpus=1
nousb\"/KDUMP_CMDLINE_APPEND=\"irqpoll maxcpus=1 nousb 2\"/"
/etc/default/kdump-tools
After I made these changes I rebooted the server, ran some tests and
everything seems to be working.
--
Rod Bruce
UNIX System and Network Administrator
PALS, A Program of the
Minnesota State Colleges and Universities
rod.bruce at mnsu.edu
507.389.2000
Quis custodiet ipsos custodes?
More information about the lxc-users
mailing list