[lxc-users] sshd-keygen fails during container boot

Peter Steele pwsteele at gmail.com
Tue Dec 8 19:10:10 UTC 2015


On 12/08/2015 08:00 AM, Serge Hallyn wrote:
> Ok, can you change the launch command in the scripts to
>
> lxc-start -n $containername -L /tmp/$containername.cout -l trace -o /tmp/$containername.dout -- /sbin/init log_target=console log_level=debug
>
> The console output will go into the .cout file and lxc debug output into .dout.
>
I've actually made some progress in reproducing this outside of my 
framework. I originally thought the problem only occurred during the 
first boot of the containers. I've discovered that it can happen any 
time the server is rebooted and the containers are started when the 
server comes up. I've only seen this problem when multiple containers 
are starting at the same time.

I incorporated your modified start command into a test as follows:

# for vm in `lxc-ls`; do lxc-start -n $vm -L /tmp/$vm.cout -l trace -o 
/tmp/$vm.dout -- /sbin/init log_target=console log_level=debug; done

This starts all of my previously created containers at roughly the same 
time, and when I do this some of the containers encounter the systemd 
errors I've been seeing. Which containers hit these errors vary from 
test to test. In looking at the .dout logs, I noticed the following:

       lxc-start 1449591253.647 DEBUG    lxc_conf - 
conf.c:setup_rootfs:1295 - mounted '/hf/cs/vm-00/rootfs' on 
'/usr/lib64/lxc/rootfs'
       lxc-start 1449591253.647 INFO     lxc_conf - 
conf.c:setup_utsname:928 - 'pws-vm-00' hostname has been setup
       lxc-start 1449591253.660 DEBUG    lxc_conf - 
conf.c:setup_hw_addr:2368 - mac address 'fe:d6:e8:96:7e:2d' on 'eth0' 
has been setup
       lxc-start 1449591253.660 DEBUG    lxc_conf - 
conf.c:setup_netdev:2595 - 'eth0' has been setup
       lxc-start 1449591253.660 INFO     lxc_conf - 
conf.c:setup_network:2616 - network has been setup
       lxc-start 1449591253.660 INFO     lxc_conf - 
conf.c:mount_autodev:1157 - Mounting container /dev
       lxc-start 1449591253.661 INFO     lxc_conf - 
conf.c:mount_autodev:1179 - Mounted tmpfs onto /usr/lib64/lxc/rootfs/dev
       lxc-start 1449591253.661 INFO     lxc_conf - 
conf.c:mount_autodev:1197 - Mounted container /dev
       lxc-start 1449591253.661 ERROR    lxc_utils - 
utils.c:open_without_symlink:1626 - No such file or directory - Error 
examining fuse in /usr/lib64/lxc/rootfs/sys/fs/fuse/connections
       lxc-start 1449591253.661 INFO     lxc_conf - 
conf.c:mount_entry:1727 - failed to mount '/sys/fs/fuse/connections' on 
'/usr/lib64/lxc/rootfs/sys/fs/fuse/connections' (optional): No such file 
or directory

All of the containers report this error, but what caught my eye is the 
mount point referenced in this error. Is this same mount point used for 
all containers that are being started? I assume this error is misleading 
but I tried changing my for loop to add a 1 second delay in starting 
each container, and after doing this there were no systemd errors. 
Unfortunately, adding a delay in my install framework had no effect, so 
I suspect the apparent positive results in adding a delay to the for 
loop was just luck.

This same container reported the following errors in its /var/log/messages:

Dec  8 08:06:39 pws-vm-00 systemd: Starting Dump dmesg to /var/log/dmesg...
Dec  8 08:06:39 pws-vm-00 systemd: Failed at step CGROUP spawning 
/etc/rc.d/init.d/jexec: No such file or directory
Dec  8 08:06:39 pws-vm-00 systemd: Starting Permit User Sessions...
Dec  8 08:06:39 pws-vm-00 systemd: Starting LSB: Bring up/down networking...
Dec  8 08:06:39 pws-vm-00 systemd: jexec.service: control process 
exited, code=exited status=219
Dec  8 08:06:39 pws-vm-00 systemd: Failed to start LSB: Supports the 
direct execution of binary formats..
Dec  8 08:06:39 pws-vm-00 systemd: Unit jexec.service entered failed state.

The .cout file for this same container didn't really have any thing of 
note, other than it also reported some of these errors:

Starting LSB: Bring up/down networking...
Starting D-Bus System Message Bus...
OK  Started D-Bus System Message Bus.
FAILED Failed to start LSB: Supports the direct execution of binary 
formats..
See 'systemctl status jexec.service' for details.

There was nothing related to this error in the .dout file. The start of 
this file does have some warnings:

       lxc-start 1449590798.820 INFO     lxc_start_ui - 
lxc_start.c:main:264 - using rcfile /var/lib/lxc/vm-00/config
       lxc-start 1449590798.822 WARN     lxc_confile - 
confile.c:config_pivotdir:1801 - lxc.pivotdir is ignored.  It will soon 
become an error.
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
cpuset unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup cpu 
unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
blkio unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
memory unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
devices unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
freezer unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
net_cls unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
perf_event unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 WARN     lxc_cgfs - 
cgfs.c:lxc_cgroup_get_container_info:1100 - Not attaching to cgroup 
hugetlb unknown to /var/lib/lxc vm-00
       lxc-start 1449590798.823 INFO     lxc_start - 
start.c:lxc_check_inherited:226 - closed inherited fd 4
       lxc-start 1449590798.825 INFO     lxc_container - 
lxccontainer.c:do_lxcapi_start:712 - Attempting to set proc title to 
[lxc monitor] /var/lib/lxc vm-00
       lxc-start 1449590798.825 ERROR    lxc_utils - 
utils.c:setproctitle:1461 - Invalid argument - setting cmdline failed

and there were some later on related to secomp:

       lxc-start 1449590798.825 INFO     lxc_seccomp - 
seccomp.c:parse_config_v2:426 - Adding native rule for finit_module 
action 327681
       lxc-start 1449590798.825 WARN     lxc_seccomp - 
seccomp.c:do_resolve_add_rule:233 - Seccomp: got negative # for syscall: 
finit_module
       lxc-start 1449590798.825 WARN     lxc_seccomp - 
seccomp.c:do_resolve_add_rule:234 - This syscall will NOT be blacklisted
       lxc-start 1449590798.825 INFO     lxc_seccomp - 
seccomp.c:parse_config_v2:429 - Adding compat rule for finit_module 
action 327681
       lxc-start 1449590798.825 WARN     lxc_seccomp - 
seccomp.c:do_resolve_add_rule:233 - Seccomp: got negative # for syscall: 
finit_module
       lxc-start 1449590798.825 WARN     lxc_seccomp - 
seccomp.c:do_resolve_add_rule:234 - This syscall will NOT be blacklisted
       lxc-start 1449590798.825 INFO     lxc_seccomp - 
seccomp.c:parse_config_v2:324 - processing: .delete_module errno 1.

along with a few other unrelated warnings. There are no errors in any of 
the logs that point to an obvious cause for these system errors, at 
least not to my eyes. Plus, I get a completely different set of failed 
VMs each time I run through an install, with different services being 
impacted each time. Is there anything in particular I should look for in 
the .cout and .dout logs that might help explain what's going on?



More information about the lxc-users mailing list