[lxc-users] sshd-keygen fails during container boot
Peter Steele
pwsteele at gmail.com
Tue Dec 8 19:10:10 UTC 2015
On 12/08/2015 08:00 AM, Serge Hallyn wrote:
> Ok, can you change the launch command in the scripts to
>
> lxc-start -n $containername -L /tmp/$containername.cout -l trace -o /tmp/$containername.dout -- /sbin/init log_target=console log_level=debug
>
> The console output will go into the .cout file and lxc debug output into .dout.
>
I've actually made some progress in reproducing this outside of my
framework. I originally thought the problem only occurred during the
first boot of the containers. I've discovered that it can happen any
time the server is rebooted and the containers are started when the
server comes up. I've only seen this problem when multiple containers
are starting at the same time.
I incorporated your modified start command into a test as follows:
# for vm in `lxc-ls`; do lxc-start -n $vm -L /tmp/$vm.cout -l trace -o
/tmp/$vm.dout -- /sbin/init log_target=console log_level=debug; done
This starts all of my previously created containers at roughly the same
time, and when I do this some of the containers encounter the systemd
errors I've been seeing. Which containers hit these errors vary from
test to test. In looking at the .dout logs, I noticed the following:
lxc-start 1449591253.647 DEBUG lxc_conf -
conf.c:setup_rootfs:1295 - mounted '/hf/cs/vm-00/rootfs' on
'/usr/lib64/lxc/rootfs'
lxc-start 1449591253.647 INFO lxc_conf -
conf.c:setup_utsname:928 - 'pws-vm-00' hostname has been setup
lxc-start 1449591253.660 DEBUG lxc_conf -
conf.c:setup_hw_addr:2368 - mac address 'fe:d6:e8:96:7e:2d' on 'eth0'
has been setup
lxc-start 1449591253.660 DEBUG lxc_conf -
conf.c:setup_netdev:2595 - 'eth0' has been setup
lxc-start 1449591253.660 INFO lxc_conf -
conf.c:setup_network:2616 - network has been setup
lxc-start 1449591253.660 INFO lxc_conf -
conf.c:mount_autodev:1157 - Mounting container /dev
lxc-start 1449591253.661 INFO lxc_conf -
conf.c:mount_autodev:1179 - Mounted tmpfs onto /usr/lib64/lxc/rootfs/dev
lxc-start 1449591253.661 INFO lxc_conf -
conf.c:mount_autodev:1197 - Mounted container /dev
lxc-start 1449591253.661 ERROR lxc_utils -
utils.c:open_without_symlink:1626 - No such file or directory - Error
examining fuse in /usr/lib64/lxc/rootfs/sys/fs/fuse/connections
lxc-start 1449591253.661 INFO lxc_conf -
conf.c:mount_entry:1727 - failed to mount '/sys/fs/fuse/connections' on
'/usr/lib64/lxc/rootfs/sys/fs/fuse/connections' (optional): No such file
or directory
All of the containers report this error, but what caught my eye is the
mount point referenced in this error. Is this same mount point used for
all containers that are being started? I assume this error is misleading
but I tried changing my for loop to add a 1 second delay in starting
each container, and after doing this there were no systemd errors.
Unfortunately, adding a delay in my install framework had no effect, so
I suspect the apparent positive results in adding a delay to the for
loop was just luck.
This same container reported the following errors in its /var/log/messages:
Dec 8 08:06:39 pws-vm-00 systemd: Starting Dump dmesg to /var/log/dmesg...
Dec 8 08:06:39 pws-vm-00 systemd: Failed at step CGROUP spawning
/etc/rc.d/init.d/jexec: No such file or directory
Dec 8 08:06:39 pws-vm-00 systemd: Starting Permit User Sessions...
Dec 8 08:06:39 pws-vm-00 systemd: Starting LSB: Bring up/down networking...
Dec 8 08:06:39 pws-vm-00 systemd: jexec.service: control process
exited, code=exited status=219
Dec 8 08:06:39 pws-vm-00 systemd: Failed to start LSB: Supports the
direct execution of binary formats..
Dec 8 08:06:39 pws-vm-00 systemd: Unit jexec.service entered failed state.
The .cout file for this same container didn't really have any thing of
note, other than it also reported some of these errors:
Starting LSB: Bring up/down networking...
Starting D-Bus System Message Bus...
OK Started D-Bus System Message Bus.
FAILED Failed to start LSB: Supports the direct execution of binary
formats..
See 'systemctl status jexec.service' for details.
There was nothing related to this error in the .dout file. The start of
this file does have some warnings:
lxc-start 1449590798.820 INFO lxc_start_ui -
lxc_start.c:main:264 - using rcfile /var/lib/lxc/vm-00/config
lxc-start 1449590798.822 WARN lxc_confile -
confile.c:config_pivotdir:1801 - lxc.pivotdir is ignored. It will soon
become an error.
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
cpuset unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup cpu
unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
blkio unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
memory unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
devices unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
freezer unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
net_cls unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
perf_event unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 WARN lxc_cgfs -
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup
hugetlb unknown to /var/lib/lxc vm-00
lxc-start 1449590798.823 INFO lxc_start -
start.c:lxc_check_inherited:226 - closed inherited fd 4
lxc-start 1449590798.825 INFO lxc_container -
lxccontainer.c:do_lxcapi_start:712 - Attempting to set proc title to
[lxc monitor] /var/lib/lxc vm-00
lxc-start 1449590798.825 ERROR lxc_utils -
utils.c:setproctitle:1461 - Invalid argument - setting cmdline failed
and there were some later on related to secomp:
lxc-start 1449590798.825 INFO lxc_seccomp -
seccomp.c:parse_config_v2:426 - Adding native rule for finit_module
action 327681
lxc-start 1449590798.825 WARN lxc_seccomp -
seccomp.c:do_resolve_add_rule:233 - Seccomp: got negative # for syscall:
finit_module
lxc-start 1449590798.825 WARN lxc_seccomp -
seccomp.c:do_resolve_add_rule:234 - This syscall will NOT be blacklisted
lxc-start 1449590798.825 INFO lxc_seccomp -
seccomp.c:parse_config_v2:429 - Adding compat rule for finit_module
action 327681
lxc-start 1449590798.825 WARN lxc_seccomp -
seccomp.c:do_resolve_add_rule:233 - Seccomp: got negative # for syscall:
finit_module
lxc-start 1449590798.825 WARN lxc_seccomp -
seccomp.c:do_resolve_add_rule:234 - This syscall will NOT be blacklisted
lxc-start 1449590798.825 INFO lxc_seccomp -
seccomp.c:parse_config_v2:324 - processing: .delete_module errno 1.
along with a few other unrelated warnings. There are no errors in any of
the logs that point to an obvious cause for these system errors, at
least not to my eyes. Plus, I get a completely different set of failed
VMs each time I run through an install, with different services being
impacted each time. Is there anything in particular I should look for in
the .cout and .dout logs that might help explain what's going on?
More information about the lxc-users
mailing list