[Lxc-users] dangling cgroups after lxc-execute with ns_cgroup

Fri Jul 6 20:34:01 UTC 2012

Quoting Arun M (arunmahadevaiyer at gmail.com):
> On Fri, Jul 6, 2012 at 8:51 PM, Serge Hallyn <serge.hallyn at canonical.com>wrote:
> 
> > Quoting Arun M (arunmahadevaiyer at gmail.com):
> > > Hi,
> > >
> > > I updated to lxc-0.8.0-rc2 and after that I observe dangling cgroups
> > > (/cgroup/PID) in the filesystem after lxc-execute terminates.
> > >
> > > I am using ns_cgroup.
> >
> > Which kernel are you on?  The ns cgroup is no longer available since
> > over a year ago.
> >
> >
> I am on 2.6.32. (RedHat Enterprise Linux 6)
> 
>  > Looks like a process is spawned in a new namespace but lxc-fails to
> > remove
> > > the cgroup directory.
> >
> > Can you show 'ps -ef'?   If you can identify the process that won't
> > die, can you see what it's doing (strace -f -o outfile -p $pid) ?
> >
> > My only guess would be that the container_reboot_supported() function,
> > which gets cloned, is for some reason not dying.  Except no, that
> > can't be it, because this change actually moves that clone() to the
> > monitor task, so it wouldn't be pinning the cgroup.
> >
> > Can you check whether your container is mounting a private /var or
> > /run?  My theory is that the initial task is never killed because you
> > are relying on the utmp watcher (being on an older kernel), and the
> > container is using a utmp that the monitor can't see.
> >
> > I dont have much idea about utmp watcher. However I see the following
> messages in the log.
> 
> $ /usr/local/bin/lxc-execute -n alpha -f n1.conf -l DEBUG -o /tmp/log  --
> /bin/sh
> 
> 
>     lxc-execute 1341623135.543 DEBUG    lxc_start - Dropping cap_sys_boot
> and watching utmp
> ...
> ...
>     lxc-execute 1341623135.584 DEBUG    lxc_utmp - Added
> '/proc/23213/root/var/run' to inotifywatch
>     lxc-execute 1341623135.584 WARN     lxc_start - invalid pid for
> SIGCHLD, siginfo.ssi_pid:23210, *pid:23213
> 
> This is while the container shell is still running.
> 
> $ file /proc/23210
> /proc/23210: cannot open `/proc/23210' (No such file or directory)
> 
> $ file /proc/23213
> /proc/23213: directory
> 
> $ cat /proc/23213/cmdline|less
> /usr/local/libexec/lxc/lxc-init^@--^@/bin/sh^@
> 
> 
> And I see two cgroups,
> 
> $ ls -ld /cgroup/alpha
> drwxr-xr-x 2 arunm users 0 Jul  7 06:35 /cgroup/alpha
> 
> $ cat /cgroup/alpha/tasks
> 23213
> 23217
> 
> $ cat /cgroup/23210/tasks
> [Nothing]
> 
> And after I exit the shell /cgroup/23210 hangs around for ever.
> 
> I dont see /var/run/utmp or /run directory inside the container.

Ooooh!  I get it.

I'm pretty sure 23210  (as I started to guess above but then decided
couldn't be the case) is the task cloned to test for reboot support.
Since we're cloning a new pid namespace, the ns cgroup creates a new
child cgroup.  That task immediately exits after testing for reboot,
and because you don't have a release agent, is new cgroup is not
being deleted.

So I guess if the ns cgroup is mounted, we need to delete that
cgroup.  You can work around it by using a release agent.  If you're
interested in writing a patch for that, I'll be happy to help.

thanks,
-serge