[Lxc-users] dangling cgroups after lxc-execute with ns_cgroup
Serge Hallyn
serge.hallyn at canonical.com
Fri Jul 6 20:34:01 UTC 2012
Quoting Arun M (arunmahadevaiyer at gmail.com):
> On Fri, Jul 6, 2012 at 8:51 PM, Serge Hallyn <serge.hallyn at canonical.com>wrote:
>
> > Quoting Arun M (arunmahadevaiyer at gmail.com):
> > > Hi,
> > >
> > > I updated to lxc-0.8.0-rc2 and after that I observe dangling cgroups
> > > (/cgroup/PID) in the filesystem after lxc-execute terminates.
> > >
> > > I am using ns_cgroup.
> >
> > Which kernel are you on? The ns cgroup is no longer available since
> > over a year ago.
> >
> >
> I am on 2.6.32. (RedHat Enterprise Linux 6)
>
> > Looks like a process is spawned in a new namespace but lxc-fails to
> > remove
> > > the cgroup directory.
> >
> > Can you show 'ps -ef'? If you can identify the process that won't
> > die, can you see what it's doing (strace -f -o outfile -p $pid) ?
> >
> > My only guess would be that the container_reboot_supported() function,
> > which gets cloned, is for some reason not dying. Except no, that
> > can't be it, because this change actually moves that clone() to the
> > monitor task, so it wouldn't be pinning the cgroup.
> >
> > Can you check whether your container is mounting a private /var or
> > /run? My theory is that the initial task is never killed because you
> > are relying on the utmp watcher (being on an older kernel), and the
> > container is using a utmp that the monitor can't see.
> >
> > I dont have much idea about utmp watcher. However I see the following
> messages in the log.
>
> $ /usr/local/bin/lxc-execute -n alpha -f n1.conf -l DEBUG -o /tmp/log --
> /bin/sh
>
>
> lxc-execute 1341623135.543 DEBUG lxc_start - Dropping cap_sys_boot
> and watching utmp
> ...
> ...
> lxc-execute 1341623135.584 DEBUG lxc_utmp - Added
> '/proc/23213/root/var/run' to inotifywatch
> lxc-execute 1341623135.584 WARN lxc_start - invalid pid for
> SIGCHLD, siginfo.ssi_pid:23210, *pid:23213
>
> This is while the container shell is still running.
>
> $ file /proc/23210
> /proc/23210: cannot open `/proc/23210' (No such file or directory)
>
> $ file /proc/23213
> /proc/23213: directory
>
> $ cat /proc/23213/cmdline|less
> /usr/local/libexec/lxc/lxc-init^@--^@/bin/sh^@
>
>
> And I see two cgroups,
>
> $ ls -ld /cgroup/alpha
> drwxr-xr-x 2 arunm users 0 Jul 7 06:35 /cgroup/alpha
>
> $ cat /cgroup/alpha/tasks
> 23213
> 23217
>
> $ cat /cgroup/23210/tasks
> [Nothing]
>
> And after I exit the shell /cgroup/23210 hangs around for ever.
>
> I dont see /var/run/utmp or /run directory inside the container.
Ooooh! I get it.
I'm pretty sure 23210 (as I started to guess above but then decided
couldn't be the case) is the task cloned to test for reboot support.
Since we're cloning a new pid namespace, the ns cgroup creates a new
child cgroup. That task immediately exits after testing for reboot,
and because you don't have a release agent, is new cgroup is not
being deleted.
So I guess if the ns cgroup is mounted, we need to delete that
cgroup. You can work around it by using a release agent. If you're
interested in writing a patch for that, I'll be happy to help.
thanks,
-serge
More information about the lxc-users
mailing list