[lxc-users] lxc-ls -f problem

david.andel at bli.uzh.ch david.andel at bli.uzh.ch
Thu May 28 12:10:00 UTC 2015


I have additional details to this:

this error occurs both with kernel 3.19.0-16 and 3.19.0-18 and unprivileged containers reproducibly produce processes in uninterruptible sleep state:

root at andel2:~# ps aux | egrep ' D | Z | H '
200000    4637  0.0  0.0   4472  1480 ?        D    14:03   0:00 /bin/sh /usr/bin/savelog -q -p -c 5 /var/log/dmesg
200000    4671  0.1  0.1 112456  6060 ?        D    14:03   0:00 /usr/sbin/nginx

After this happens, all further lxc- calls hang or return errors, like e.g.
david at andel2:~$ lxc-ls -f
lxc_container: cgmanager.c: lxc_cgmanager_enter: 694 call to cgmanager_move_pid_abs_sync failed: invalid request                                  
lxc_container: cgmanager.c: do_cgm_get: 871 Failed to enter container cgroup freezer:                                                             
lxc_container: cgmanager.c: lxc_cgmanager_enter: 694 call to cgmanager_move_pid_abs_sync failed: invalid request                                  
lxc_container: cgmanager.c: do_cgm_get: 871 Failed to enter container cgroup freezer:                                                             
lxc_container: utils.c: switch_to_ns: 1337 No such file or directory - failed to open /proc/3109/ns/net                                           
lxc_container: lxccontainer.c: lxcapi_get_ips: 1665 No such file or directory - failed to enter namespace                                         
lxc_container: cgmanager.c: lxc_cgmanager_enter: 694 call to cgmanager_move_pid_abs_sync failed: invalid request                                  
lxc_container: cgmanager.c: do_cgm_get: 871 Failed to enter container cgroup freezer:                                                             
lxc_container: utils.c: switch_to_ns: 1337 No such file or directory - failed to open /proc/3109/ns/net                                           
lxc_container: lxccontainer.c: lxcapi_get_ips: 1665 No such file or directory - failed to enter namespace                                         
NAME      STATE    IPV4  IPV6  GROUPS  AUTOSTART                                                                                                  
...


-----"lxc-users" <lxc-users-bounces at lists.linuxcontainers.org> wrote: -----
To: LXC users mailing-list <lxc-users at lists.linuxcontainers.org>
From: david.andel at bli.uzh.ch
Sent by: "lxc-users" 
Date: 05/27/2015 16:05
Subject: Re: [lxc-users] lxc-ls -f problem

Now attached the output of 
strace -f -ostrace.out -- lxc-ls -f
strace -f -ostrace-start.out -- lxc-start -n s0_RStSh
lxc-start -n s0_RStSh -l trace -o debug.out

I was running these not as root this time but if that is required I will post those as well.

Interestingly, this happens only on a vivid running in a KVM.
On three other vivid instances running on bare metal this does not happen.

I am running the latest stable releases from the PPA, i.e. lxc 1.1.2-0ubuntu3.

Cheers,
David


-----"lxc-users" <lxc-users-bounces at lists.linuxcontainers.org> wrote: -----
To: LXC users mailing-list <lxc-users at lists.linuxcontainers.org>
From: david.andel at bli.uzh.ch
Sent by: "lxc-users" 
Date: 05/23/2015 20:47
Subject: Re: [lxc-users] lxc-ls -f problem

Hi

I have the exact same problem after yesterdays update.

And I suspect it is bug https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1413927 or at least closely related.

root at andel2:~# cat /proc/self/cgroup
10:devices:/system.slice/ssh.service
9:perf_event:/system.slice/ssh.service
8:cpuset:/system.slice/ssh.service
7:cpu,cpuacct:/system.slice/ssh.service
6:memory:/system.slice/ssh.service
5:freezer:/system.slice/ssh.service
4:net_cls,net_prio:/system.slice/ssh.service
3:hugetlb:/system.slice/ssh.service
2:blkio:/system.slice/ssh.service
1:name=systemd:/system.slice/ssh.service

root at andel2:~# service cgmanager status
● cgmanager.service - Cgroup management daemon
   Loaded: loaded (/lib/systemd/system/cgmanager.service; disabled; vendor preset: enabled)
   Active: active (running) since Sat 2015-05-23 15:48:07 CEST; 30min ago
 Main PID: 2994 (cgmanager)
   Memory: 296.0K
   CGroup: /system.slice/cgmanager.service
           ‣ 2994 /sbin/cgmanager -m name=systemd

May 23 15:48:15 andel2 cgmanager[2994]: cgmanager: Invalid path /run/cgmanager/fs/hugetlb/system.slice/ssh.service/lxc/s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/hugetlb/system.slice/ssh.servi...s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager: Invalid path /run/cgmanager/fs/memory/system.slice/ssh.service/lxc/s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/memory/system.slice/ssh.servic...s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager: Invalid path /run/cgmanager/fs/net_cls/system.slice/ssh.service/lxc/s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/net_cls/system.slice/ssh.servi...s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager: Invalid path /run/cgmanager/fs/perf_event/system.slice/ssh.service/lxc/s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/perf_event/system.slice/ssh.se...s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager: Invalid path /run/cgmanager/fs/none,name=systemd/system.slice/ssh.service/lxc/s0_nginx
May 23 15:48:15 andel2 cgmanager[2994]: cgmanager:per_ctrl_move_pid_main: Invalid path /run/cgmanager/fs/none,name=systemd/system.slice...s0_nginx
Hint: Some lines were ellipsized, use -l to show in full.

The unprivileged containers could be stopped but trying to stop a running privileged container hangs and blocked the host completely.
Even a reboot is not possible, the host answers only to ping requests, ssh returns with "Write failed: Broken pipe".
And since the machine is geographically distant (and it's weekend as usual when such stuff happens) I cannot provide the results generated from the commands below.

But probably I am going to run into the same error on other machines and will provide the results.

David


-----"lxc-users" <lxc-users-bounces at lists.linuxcontainers.org> wrote: -----
To: LXC users mailing-list <lxc-users at lists.linuxcontainers.org>
From: Serge Hallyn 
Sent by: "lxc-users" 
Date: 05/22/2015 17:44
Subject: Re: [lxc-users] lxc-ls -f problem

Quoting Dave Birch (dave.birch at gmail.com):
> Dave Birch <dave.birch at ...> writes:
> 
> Further update - just discovered that lxc-start now hangs for all 
> containers, even newly created ones using only the standard download 
> template on lxc-create.
> 
> I'm pretty much dead in the water until I can work out how to resolve 
> this.

Can you attach the results of

sudo strace -f -ostrace.out -- lxc-ls -f
sudo strace -f -ostrace-start.out -- lxc-start -n <container>
sudo lxc-start -n <container> -l trace -o debug.out

and show your exact steps, if you can remember them or have them in
history, when you were originally creating these containers?
_______________________________________________
lxc-users mailing list
lxc-users at lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users 
 
_______________________________________________
lxc-users mailing list
lxc-users at lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users
 
_______________________________________________
lxc-users mailing list
lxc-users at lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

[attachment "outfiles.tgz" removed by David Andel/at/UZH]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20150528/244a1984/attachment.html>


More information about the lxc-users mailing list