[lxc-devel] lxc_monitord - monitor exiting
Dwight Engen
dwight.engen at oracle.com
Sat May 4 13:16:44 UTC 2013
Hi Çağlar,
I'm confused by your output, it certainly looks like something isn't
right. Do you have a theory as to why monitord thinks it still has 9
clients?
On Sat, 4 May 2013 00:01:45 -0400
S.Çağlar Onur <caglar at 10ur.org> wrote:
> Hi all,
>
> I think I understand why I'm confused before while chasing another
> bug. This is what I'm seeing right now.
>
> * I patched lxc_monitord.c with following
>
> diff --git a/src/lxc/lxc_monitord.c b/src/lxc/lxc_monitord.c
> index e76af71..59f1e9d 100644
> --- a/src/lxc/lxc_monitord.c
> +++ b/src/lxc/lxc_monitord.c
> @@ -373,6 +373,7 @@ int main(int argc, char *argv[])
> }
>
> if (lxc_monitord_create(&mon)) {
> + NOTICE("create failed");
> goto out;
> }
>
> @@ -398,6 +399,7 @@ int main(int argc, char *argv[])
> NOTICE("no clients for 30 seconds, exiting");
> break;
> }
> + NOTICE("clients %d", mon.clientfds_cnt);
> }
>
> lxc_mainloop_close(&mon.descr);
>
> * I started 10 containers using go bindings
>
> [caglar at qgq:~/Project/lxc/examples] sudo ./concurrent_start
> Starting the container (3)...
> Starting the container (2)...
> Starting the container (4)...
> Starting the container (0)...
> Starting the container (1)...
> Starting the container (8)...
> Starting the container (7)...
> Starting the container (6)...
> Starting the container (5)...
> Starting the container (9)...
>
> * Then started to stop them 1 by 1 using lxc-stop
>
> [caglar at qgq:~/Project/lxc/examples] sudo lxc-stop -n 0
> [caglar at qgq:~/Project/lxc/examples] sudo ./list
> 0 (STOPPED)
> 1 (RUNNING)
> 2 (RUNNING)
> 3 (RUNNING)
> 4 (RUNNING)
> 5 (RUNNING)
> 6 (RUNNING)
> 7 (RUNNING)
> 8 (RUNNING)
> 9 (RUNNING)
I assume you stopped 1-8 here?
> [caglar at qgq:~/Project/lxc/examples] date && sudo ./list
> Fri May 3 23:57:14 EDT 2013
> 0 (STOPPED)
> 1 (STOPPED)
> 2 (STOPPED)
> 3 (STOPPED)
> 4 (STOPPED)
> 5 (STOPPED)
> 6 (STOPPED)
> 7 (STOPPED)
> 8 (STOPPED)
> 9 (RUNNING)
> bleach (STOPPED)
>
> * lxc-monitord is still around after ~10min
Looks like its not going away because it thinks there are 9 clients
still. My guess is somehow its not getting notified of the client
closes (or they're still around?). The following patch should provide a
bit more info in the log:
diff --git a/src/lxc/lxc_monitord.c b/src/lxc/lxc_monitord.c
index e76af71..537a2b3 100644
--- a/src/lxc/lxc_monitord.c
+++ b/src/lxc/lxc_monitord.c
@@ -114,6 +114,7 @@ static int lxc_monitord_fifo_delete(struct lxc_monitor *mon)
static void lxc_monitord_sockfd_remove(struct lxc_monitor *mon, int fd) {
int i;
+ INFO("removing fd %d\n", fd);
if (lxc_mainloop_del_handler(&mon->descr, fd))
CRIT("fd:%d not found in mainloop", fd);
close(fd);
@@ -343,7 +344,7 @@ int main(int argc, char *argv[])
if (ret < 0 || ret >= sizeof(logpath))
return EXIT_FAILURE;
- ret = lxc_log_init(NULL, logpath, "NOTICE", "lxc-monitord", 0, lxcpath);
+ ret = lxc_log_init(NULL, logpath, "INFO", "lxc-monitord", 0, lxcpath);
if (ret)
return ret;
> [caglar at qgq:~/Project/lxc/examples] ps aux |
> grep /usr/bin/lxc-monitord caglar 1170 0.0 0.0 13580 940
> pts/3 S+ 23:57 0:00 grep --color=auto /usr/bin/lxc-monitord
> root 29997 0.0 0.0 15000 744 ? Ss 23:47 0:00
> /usr/bin/lxc-monitord /var/lib/lxc 5
> [caglar at qgq:~/Project/lxc/examples] date
> Fri May 3 23:57:52 EDT 2013
>
> * And lastly here is what lxc-monitord.log shows
>
> [caglar at qgq:~/Project/lxc(clone)] tail
> -f /var/lib/lxc/lxc-monitord.log lxc-monitord 1367639242.631 NOTICE
> lxc_monitord - monitoring lxcpath /var/lib/lxc
> lxc-monitord 1367639242.633 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.633 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.636 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.639 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.643 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.643 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.651 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.654 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.665 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.678 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.681 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.681 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.682 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.707 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.710 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.710 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.722 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.733 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639242.831 NOTICE lxc_monitord - create failed
> lxc-monitord 1367639274.071 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639323.928 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639372.862 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639444.107 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639474.130 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639504.133 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639534.161 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639564.190 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639594.209 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639624.223 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639654.256 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639684.287 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639714.317 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639744.347 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639774.370 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639804.396 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639834.426 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639864.456 NOTICE lxc_monitord - clients 9
> lxc-monitord 1367639894.486 NOTICE lxc_monitord - clients 9
You might want to consider patching the log stuff to print out pids, I
found that helpful while working on this:
diff --git a/src/lxc/log.c b/src/lxc/log.c
index d49a544..98581c1 100644
--- a/src/lxc/log.c
+++ b/src/lxc/log.c
@@ -58,7 +58,7 @@ static int log_append_stderr(const struct lxc_log_appender *appender,
if (event->priority < LXC_LOG_PRIORITY_ERROR)
return 0;
- fprintf(stderr, "%s: ", log_prefix);
+ fprintf(stderr, "%-5d %s: ", getpid(), log_prefix);
vfprintf(stderr, event->fmt, *event->vap);
fprintf(stderr, "\n");
return 0;
@@ -75,7 +75,8 @@ static int log_append_logfile(const struct lxc_log_appender *appender,
return 0;
n = snprintf(buffer, sizeof(buffer),
- "%15s %10ld.%03ld %-8s %s - ",
+ "%-5d %15s %10ld.%03ld %-8s %s - ",
+ getpid(),
log_prefix,
event->timestamp.tv_sec,
event->timestamp.tv_usec / 1000,
> On Fri, Apr 26, 2013 at 4:52 PM, S.Çağlar Onur <caglar at 10ur.org>
> wrote:
>
> > Yeah, I think you all correct and I'm just confused - probably
> > direct effect of lack of caffeine. And no, it's not complicating
> > something for me, it's working great. I just want to make sure that
> > I'm wrong :)
> >
> >
> > On Fri, Apr 26, 2013 at 4:37 PM, Dwight Engen
> > <dwight.engen at oracle.com>wrote:
> >
> >> On Fri, 26 Apr 2013 22:07:22 +0200
> >> Stéphane Graber <stgraber at ubuntu.com> wrote:
> >>
> >> > On 04/26/2013 09:42 PM, S.Çağlar Onur wrote:
> >> > > Hey Dwight,
> >> > >
> >> > > I'm observing following behavior with staging tree and just
> >> > > wanted to make sure that what I'm seeing is the expected;
> >> > >
> >> > > * Initially nothing runs
> >> > >
> >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
> >> > > bankai (STOPPED)
> >> > > bleach (STOPPED)
> >> > > zangetsu (STOPPED)
> >> > >
> >> > > * I start one container using the API
> >> > >
> >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./start -name
> >> > > zangetsu Starting the container...
> >> > >
> >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
> >> > > bankai (STOPPED)
> >> > > bleach (STOPPED)
> >> > > zangetsu (RUNNING)
> >> > >
> >> > > * monitord starts as expected but exits after 30 seconds later
> >> > > (although container is still running);
> >> > >
> >> > > [caglar at qgq:~/Projects/lxc-upstream(staging)] tail -f
> >> > > /var/lib/lxc/lxc-monitord.log
> >> > > lxc-monitord 1367004858.616 NOTICE lxc_monitord -
> >> > > monitoring lxcpath /var/lib/lxc
> >> > > lxc-monitord 1367004888.677 NOTICE lxc_monitord - no
> >> > > clients for 30 seconds, exiting
> >> > > lxc-monitord 1367004888.677 NOTICE lxc_monitord - monitor
> >> > > exiting
> >> > >
> >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
> >> > > bankai (STOPPED)
> >> > > bleach (STOPPED)
> >> > > zangetsu (RUNNING)
> >> > >
> >> > > [caglar at qgq:~/Projects/lxc/examples] ps aux | grep monitord
> >> > > caglar 28404 0.0 0.0 7240 624 pts/54 S+ 15:34
> >> > > 0:00 tail -f /var/lib/lxc/lxc-monitord.log
> >> > > caglar 29037 0.0 0.0 9436 948 pts/0 S+ 15:38
> >> > > 0:00 grep --color=auto monitord
> >> > > [caglar at qgq:~/Projects/lxc/examples]
> >> > >
> >> > > I'm asking cause I was under the impression that lxc-monitord
> >> > > will keep running as long as there is a container. Am I wrong?
> >> >
> >> > I believe the monitor will get spawned the first time something
> >> > needs it (lxc-monitor/lxc-wait) and exit 30s after the last
> >> > client disconnects. It'll then be respawned the next time
> >> > lxc-monitor or lxc-wait is started again that container.
> >>
> >> Yep Stéphane, that is correct. Also note that the monitord is per
> >> lxcpath, not per container.
> >>
> >> Çağlar, you may have been slightly confused because if you start a
> >> container in daemon mode through the API, the API does an internal
> >> lxc_wait() and thus a monitord will get spawned when you first
> >> start a container, but will go away ~30 seconds afterwards.
> >>
> >
> >
> >
> > --
> > S.Çağlar Onur <caglar at 10ur.org>
> >
>
>
>
More information about the lxc-devel
mailing list