[lxc-devel] lxc_monitord - monitor exiting

S.Çağlar Onur caglar at 10ur.org
Sat May 4 16:22:58 UTC 2013


Hi Dwight,

I run same commands and this time it ended up with 6 clients (output ->
http://10ur.org/monitor.txt).  I'll try to reproduce this with c/python.;.

On Sat, May 4, 2013 at 11:15 AM, S.Çağlar Onur <caglar at 10ur.org> wrote:

> Hi Dwight,
>
> On Sat, May 4, 2013 at 9:16 AM, Dwight Engen <dwight.engen at oracle.com>wrote:
>
>> Hi Çağlar,
>>
>> I'm confused by your output, it certainly looks like something isn't
>> right. Do you have a theory as to why monitord thinks it still has 9
>> clients?
>
>
> Nope, I don't. I'll try to debug more and also try to reproduce this with
> python or c api to rule out go.
>
>
>>  On Sat, 4 May 2013 00:01:45 -0400
>> S.Çağlar Onur <caglar at 10ur.org> wrote:
>>
>> > Hi all,
>> >
>> > I think I understand why I'm confused before while chasing another
>> > bug. This is what I'm seeing right now.
>> >
>> > * I patched lxc_monitord.c with following
>> >
>> >  diff --git a/src/lxc/lxc_monitord.c b/src/lxc/lxc_monitord.c
>> > index e76af71..59f1e9d 100644
>> > --- a/src/lxc/lxc_monitord.c
>> > +++ b/src/lxc/lxc_monitord.c
>> > @@ -373,6 +373,7 @@ int main(int argc, char *argv[])
>> >         }
>> >
>> >         if (lxc_monitord_create(&mon)) {
>> > +               NOTICE("create failed");
>> >                 goto out;
>> >         }
>> >
>> > @@ -398,6 +399,7 @@ int main(int argc, char *argv[])
>> >                         NOTICE("no clients for 30 seconds, exiting");
>> >                         break;
>> >                 }
>> > +               NOTICE("clients %d", mon.clientfds_cnt);
>> >         }
>> >
>> >         lxc_mainloop_close(&mon.descr);
>> >
>> > * I started 10 containers using go bindings
>> >
>> > [caglar at qgq:~/Project/lxc/examples] sudo ./concurrent_start
>> > Starting the container (3)...
>> > Starting the container (2)...
>> > Starting the container (4)...
>> > Starting the container (0)...
>> > Starting the container (1)...
>> > Starting the container (8)...
>> > Starting the container (7)...
>> > Starting the container (6)...
>> > Starting the container (5)...
>> > Starting the container (9)...
>> >
>> > * Then started to stop them 1 by 1 using lxc-stop
>> >
>> > [caglar at qgq:~/Project/lxc/examples] sudo lxc-stop -n 0
>> > [caglar at qgq:~/Project/lxc/examples] sudo ./list
>> > 0 (STOPPED)
>> > 1 (RUNNING)
>> > 2 (RUNNING)
>> > 3 (RUNNING)
>> > 4 (RUNNING)
>> > 5 (RUNNING)
>> > 6 (RUNNING)
>> > 7 (RUNNING)
>> > 8 (RUNNING)
>> > 9 (RUNNING)
>>
>> I assume you stopped 1-8 here?
>
>
> Yes, sorry for pasting the half of the output.
>
>
>>  > [caglar at qgq:~/Project/lxc/examples] date && sudo ./list
>> > Fri May  3 23:57:14 EDT 2013
>> > 0 (STOPPED)
>> > 1 (STOPPED)
>> > 2 (STOPPED)
>> > 3 (STOPPED)
>> > 4 (STOPPED)
>> > 5 (STOPPED)
>> > 6 (STOPPED)
>> > 7 (STOPPED)
>> > 8 (STOPPED)
>> > 9 (RUNNING)
>> > bleach (STOPPED)
>> >
>> > * lxc-monitord is still around after ~10min
>>
>> Looks like its not going away because it thinks there are 9 clients
>> still. My guess is somehow its not getting notified of the client
>> closes (or they're still around?). The following patch should provide a
>> bit more info in the log:
>>
>
> As long as I see they are not around (I use lxc-stop to stop them).
>
>
>> diff --git a/src/lxc/lxc_monitord.c b/src/lxc/lxc_monitord.c
>> index e76af71..537a2b3 100644
>> --- a/src/lxc/lxc_monitord.c
>> +++ b/src/lxc/lxc_monitord.c
>> @@ -114,6 +114,7 @@ static int lxc_monitord_fifo_delete(struct
>> lxc_monitor *mon)
>>  static void lxc_monitord_sockfd_remove(struct lxc_monitor *mon, int fd) {
>>         int i;
>>
>> +       INFO("removing fd %d\n", fd);
>>         if (lxc_mainloop_del_handler(&mon->descr, fd))
>>                 CRIT("fd:%d not found in mainloop", fd);
>>         close(fd);
>> @@ -343,7 +344,7 @@ int main(int argc, char *argv[])
>>         if (ret < 0 || ret >= sizeof(logpath))
>>                 return EXIT_FAILURE;
>>
>> -       ret = lxc_log_init(NULL, logpath, "NOTICE", "lxc-monitord", 0,
>> lxcpath);
>> +       ret = lxc_log_init(NULL, logpath, "INFO", "lxc-monitord", 0,
>> lxcpath);
>>         if (ret)
>>                 return ret;
>>
>>
>>
>> > [caglar at qgq:~/Project/lxc/examples] ps aux |
>> > grep /usr/bin/lxc-monitord caglar    1170  0.0  0.0  13580   940
>> > pts/3    S+   23:57   0:00 grep --color=auto /usr/bin/lxc-monitord
>> > root     29997  0.0  0.0  15000   744 ?        Ss   23:47   0:00
>> > /usr/bin/lxc-monitord /var/lib/lxc 5
>> > [caglar at qgq:~/Project/lxc/examples] date
>> > Fri May  3 23:57:52 EDT 2013
>> >
>> > * And lastly here is what lxc-monitord.log shows
>> >
>> > [caglar at qgq:~/Project/lxc(clone)] tail
>> > -f /var/lib/lxc/lxc-monitord.log lxc-monitord 1367639242.631 NOTICE
>> > lxc_monitord - monitoring lxcpath /var/lib/lxc
>> >    lxc-monitord 1367639242.633 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.633 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.636 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.639 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.643 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.643 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.651 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.654 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.665 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.678 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.681 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.681 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.682 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.707 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.710 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.710 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.722 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.733 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639242.831 NOTICE   lxc_monitord - create failed
>> >    lxc-monitord 1367639274.071 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639323.928 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639372.862 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639444.107 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639474.130 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639504.133 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639534.161 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639564.190 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639594.209 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639624.223 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639654.256 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639684.287 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639714.317 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639744.347 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639774.370 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639804.396 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639834.426 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639864.456 NOTICE   lxc_monitord - clients 9
>> >    lxc-monitord 1367639894.486 NOTICE   lxc_monitord - clients 9
>>
>> You might want to consider patching the log stuff to print out pids, I
>> found that helpful while working on this:
>>
>
> Will patch both monitord.c and log.c and report back. Thanks!
>
>
>> diff --git a/src/lxc/log.c b/src/lxc/log.c
>> index d49a544..98581c1 100644
>> --- a/src/lxc/log.c
>> +++ b/src/lxc/log.c
>> @@ -58,7 +58,7 @@ static int log_append_stderr(const struct
>> lxc_log_appender *appender,
>>         if (event->priority < LXC_LOG_PRIORITY_ERROR)
>>                 return 0;
>>
>> -       fprintf(stderr, "%s: ", log_prefix);
>> +       fprintf(stderr, "%-5d %s: ", getpid(), log_prefix);
>>         vfprintf(stderr, event->fmt, *event->vap);
>>         fprintf(stderr, "\n");
>>         return 0;
>> @@ -75,7 +75,8 @@ static int log_append_logfile(const struct
>> lxc_log_appender *appender,
>>                 return 0;
>>
>>         n = snprintf(buffer, sizeof(buffer),
>> -                    "%15s %10ld.%03ld %-8s %s - ",
>> +                    "%-5d %15s %10ld.%03ld %-8s %s - ",
>> +                    getpid(),
>>                      log_prefix,
>>                      event->timestamp.tv_sec,
>>                      event->timestamp.tv_usec / 1000,
>>
>>
>> > On Fri, Apr 26, 2013 at 4:52 PM, S.Çağlar Onur <caglar at 10ur.org>
>> > wrote:
>> >
>> > > Yeah, I think you all correct and I'm just confused - probably
>> > > direct effect of lack of caffeine. And no, it's not complicating
>> > > something for me, it's working great. I just want to make sure that
>> > > I'm wrong :)
>> > >
>> > >
>> > > On Fri, Apr 26, 2013 at 4:37 PM, Dwight Engen
>> > > <dwight.engen at oracle.com>wrote:
>> > >
>> > >> On Fri, 26 Apr 2013 22:07:22 +0200
>> > >> Stéphane Graber <stgraber at ubuntu.com> wrote:
>> > >>
>> > >> > On 04/26/2013 09:42 PM, S.Çağlar Onur wrote:
>> > >> > > Hey Dwight,
>> > >> > >
>> > >> > > I'm observing following behavior with staging tree and just
>> > >> > > wanted to make sure that what I'm seeing is the expected;
>> > >> > >
>> > >> > > * Initially nothing runs
>> > >> > >
>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
>> > >> > > bankai (STOPPED)
>> > >> > > bleach (STOPPED)
>> > >> > > zangetsu (STOPPED)
>> > >> > >
>> > >> > > * I start one container using the API
>> > >> > >
>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./start -name
>> > >> > > zangetsu Starting the container...
>> > >> > >
>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
>> > >> > > bankai (STOPPED)
>> > >> > > bleach (STOPPED)
>> > >> > > zangetsu (RUNNING)
>> > >> > >
>> > >> > > * monitord starts as expected but exits after 30 seconds later
>> > >> > > (although container is still running);
>> > >> > >
>> > >> > > [caglar at qgq:~/Projects/lxc-upstream(staging)] tail -f
>> > >> > > /var/lib/lxc/lxc-monitord.log
>> > >> > >    lxc-monitord 1367004858.616 NOTICE   lxc_monitord -
>> > >> > > monitoring lxcpath /var/lib/lxc
>> > >> > >    lxc-monitord 1367004888.677 NOTICE   lxc_monitord - no
>> > >> > > clients for 30 seconds, exiting
>> > >> > >    lxc-monitord 1367004888.677 NOTICE   lxc_monitord - monitor
>> > >> > > exiting
>> > >> > >
>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
>> > >> > > bankai (STOPPED)
>> > >> > > bleach (STOPPED)
>> > >> > > zangetsu (RUNNING)
>> > >> > >
>> > >> > > [caglar at qgq:~/Projects/lxc/examples] ps aux | grep monitord
>> > >> > > caglar   28404  0.0  0.0   7240   624 pts/54   S+   15:34
>> > >> > > 0:00 tail -f /var/lib/lxc/lxc-monitord.log
>> > >> > > caglar   29037  0.0  0.0   9436   948 pts/0    S+   15:38
>> > >> > > 0:00 grep --color=auto monitord
>> > >> > > [caglar at qgq:~/Projects/lxc/examples]
>> > >> > >
>> > >> > > I'm asking cause I was under the impression that lxc-monitord
>> > >> > > will keep running as long as there is a container. Am I wrong?
>> > >> >
>> > >> > I believe the monitor will get spawned the first time something
>> > >> > needs it (lxc-monitor/lxc-wait) and exit 30s after the last
>> > >> > client disconnects. It'll then be respawned the next time
>> > >> > lxc-monitor or lxc-wait is started again that container.
>> > >>
>> > >> Yep Stéphane, that is correct. Also note that the monitord is per
>> > >> lxcpath, not per container.
>> > >>
>> > >> Çağlar, you may have been slightly confused because if you start a
>> > >> container in daemon mode through the API, the API does an internal
>> > >> lxc_wait() and thus a monitord will get spawned when you first
>> > >> start a container, but will go away ~30 seconds afterwards.
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > S.Çağlar Onur <caglar at 10ur.org>
>> > >
>> >
>> >
>> >
>>
>>
>
>
> --
> S.Çağlar Onur <caglar at 10ur.org>
>



-- 
S.Çağlar Onur <caglar at 10ur.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130504/91901027/attachment.html>


More information about the lxc-devel mailing list