[lxc-devel] lxc_monitord - monitor exiting
S.Çağlar Onur
caglar at 10ur.org
Sat May 4 17:14:24 UTC 2013
OK reproduced it with C API
26624 lxc-monitord 1367687488.006 NOTICE lxc_monitord - monitoring
lxcpath /var/lib/lxc
26627 lxc-monitord 1367687488.008 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26630 lxc-monitord 1367687488.009 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26634 lxc-monitord 1367687488.011 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26635 lxc-monitord 1367687488.011 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26636 lxc-monitord 1367687488.014 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26639 lxc-monitord 1367687488.014 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26644 lxc-monitord 1367687488.014 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26650 lxc-monitord 1367687488.018 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26652 lxc-monitord 1367687488.018 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.020 INFO lxc_monitord - accepted
client fd:5 clients:1
26624 lxc-monitord 1367687488.020 INFO lxc_monitord - accepted
client fd:9 clients:2
26643 lxc-monitord 1367687488.021 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26664 lxc-monitord 1367687488.023 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.025 INFO lxc_monitord - accepted
client fd:10 clients:3
26667 lxc-monitord 1367687488.026 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.028 INFO lxc_monitord - accepted
client fd:11 clients:4
26677 lxc-monitord 1367687488.028 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26679 lxc-monitord 1367687488.028 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.028 INFO lxc_monitord - accepted
client fd:12 clients:5
26680 lxc-monitord 1367687488.031 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.031 INFO lxc_monitord - accepted
client fd:13 clients:6
26684 lxc-monitord 1367687488.032 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26676 lxc-monitord 1367687488.033 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.034 INFO lxc_monitord - accepted
client fd:14 clients:7
26624 lxc-monitord 1367687488.034 INFO lxc_monitord - accepted
client fd:15 clients:8
26683 lxc-monitord 1367687488.036 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.037 INFO lxc_monitord - accepted
client fd:16 clients:9
26690 lxc-monitord 1367687488.038 INFO lxc_monitord - monitor fifo
/var/lib/lxc/monitor-fifo exists, already running?
26624 lxc-monitord 1367687488.039 INFO lxc_monitord - accepted
client fd:17 clients:10
26624 lxc-monitord 1367687488.748 INFO lxc_monitord - removing fd 15
26624 lxc-monitord 1367687489.239 INFO lxc_monitord - removing fd 17
26624 lxc-monitord 1367687489.395 INFO lxc_monitord - removing fd 16
26624 lxc-monitord 1367687519.425 NOTICE lxc_monitord - # of clients: 7
code sits @ http://10ur.org/lxc/
On Sat, May 4, 2013 at 12:22 PM, S.Çağlar Onur <caglar at 10ur.org> wrote:
> Hi Dwight,
>
> I run same commands and this time it ended up with 6 clients (output ->
> http://10ur.org/monitor.txt). I'll try to reproduce this with c/python.;.
>
> On Sat, May 4, 2013 at 11:15 AM, S.Çağlar Onur <caglar at 10ur.org> wrote:
>
>> Hi Dwight,
>>
>> On Sat, May 4, 2013 at 9:16 AM, Dwight Engen <dwight.engen at oracle.com>wrote:
>>
>>> Hi Çağlar,
>>>
>>> I'm confused by your output, it certainly looks like something isn't
>>> right. Do you have a theory as to why monitord thinks it still has 9
>>> clients?
>>
>>
>> Nope, I don't. I'll try to debug more and also try to reproduce this with
>> python or c api to rule out go.
>>
>>
>>> On Sat, 4 May 2013 00:01:45 -0400
>>> S.Çağlar Onur <caglar at 10ur.org> wrote:
>>>
>>> > Hi all,
>>> >
>>> > I think I understand why I'm confused before while chasing another
>>> > bug. This is what I'm seeing right now.
>>> >
>>> > * I patched lxc_monitord.c with following
>>> >
>>> > diff --git a/src/lxc/lxc_monitord.c b/src/lxc/lxc_monitord.c
>>> > index e76af71..59f1e9d 100644
>>> > --- a/src/lxc/lxc_monitord.c
>>> > +++ b/src/lxc/lxc_monitord.c
>>> > @@ -373,6 +373,7 @@ int main(int argc, char *argv[])
>>> > }
>>> >
>>> > if (lxc_monitord_create(&mon)) {
>>> > + NOTICE("create failed");
>>> > goto out;
>>> > }
>>> >
>>> > @@ -398,6 +399,7 @@ int main(int argc, char *argv[])
>>> > NOTICE("no clients for 30 seconds, exiting");
>>> > break;
>>> > }
>>> > + NOTICE("clients %d", mon.clientfds_cnt);
>>> > }
>>> >
>>> > lxc_mainloop_close(&mon.descr);
>>> >
>>> > * I started 10 containers using go bindings
>>> >
>>> > [caglar at qgq:~/Project/lxc/examples] sudo ./concurrent_start
>>> > Starting the container (3)...
>>> > Starting the container (2)...
>>> > Starting the container (4)...
>>> > Starting the container (0)...
>>> > Starting the container (1)...
>>> > Starting the container (8)...
>>> > Starting the container (7)...
>>> > Starting the container (6)...
>>> > Starting the container (5)...
>>> > Starting the container (9)...
>>> >
>>> > * Then started to stop them 1 by 1 using lxc-stop
>>> >
>>> > [caglar at qgq:~/Project/lxc/examples] sudo lxc-stop -n 0
>>> > [caglar at qgq:~/Project/lxc/examples] sudo ./list
>>> > 0 (STOPPED)
>>> > 1 (RUNNING)
>>> > 2 (RUNNING)
>>> > 3 (RUNNING)
>>> > 4 (RUNNING)
>>> > 5 (RUNNING)
>>> > 6 (RUNNING)
>>> > 7 (RUNNING)
>>> > 8 (RUNNING)
>>> > 9 (RUNNING)
>>>
>>> I assume you stopped 1-8 here?
>>
>>
>> Yes, sorry for pasting the half of the output.
>>
>>
>>> > [caglar at qgq:~/Project/lxc/examples] date && sudo ./list
>>> > Fri May 3 23:57:14 EDT 2013
>>> > 0 (STOPPED)
>>> > 1 (STOPPED)
>>> > 2 (STOPPED)
>>> > 3 (STOPPED)
>>> > 4 (STOPPED)
>>> > 5 (STOPPED)
>>> > 6 (STOPPED)
>>> > 7 (STOPPED)
>>> > 8 (STOPPED)
>>> > 9 (RUNNING)
>>> > bleach (STOPPED)
>>> >
>>> > * lxc-monitord is still around after ~10min
>>>
>>> Looks like its not going away because it thinks there are 9 clients
>>> still. My guess is somehow its not getting notified of the client
>>> closes (or they're still around?). The following patch should provide a
>>> bit more info in the log:
>>>
>>
>> As long as I see they are not around (I use lxc-stop to stop them).
>>
>>
>>> diff --git a/src/lxc/lxc_monitord.c b/src/lxc/lxc_monitord.c
>>> index e76af71..537a2b3 100644
>>> --- a/src/lxc/lxc_monitord.c
>>> +++ b/src/lxc/lxc_monitord.c
>>> @@ -114,6 +114,7 @@ static int lxc_monitord_fifo_delete(struct
>>> lxc_monitor *mon)
>>> static void lxc_monitord_sockfd_remove(struct lxc_monitor *mon, int fd)
>>> {
>>> int i;
>>>
>>> + INFO("removing fd %d\n", fd);
>>> if (lxc_mainloop_del_handler(&mon->descr, fd))
>>> CRIT("fd:%d not found in mainloop", fd);
>>> close(fd);
>>> @@ -343,7 +344,7 @@ int main(int argc, char *argv[])
>>> if (ret < 0 || ret >= sizeof(logpath))
>>> return EXIT_FAILURE;
>>>
>>> - ret = lxc_log_init(NULL, logpath, "NOTICE", "lxc-monitord", 0,
>>> lxcpath);
>>> + ret = lxc_log_init(NULL, logpath, "INFO", "lxc-monitord", 0,
>>> lxcpath);
>>> if (ret)
>>> return ret;
>>>
>>>
>>>
>>> > [caglar at qgq:~/Project/lxc/examples] ps aux |
>>> > grep /usr/bin/lxc-monitord caglar 1170 0.0 0.0 13580 940
>>> > pts/3 S+ 23:57 0:00 grep --color=auto /usr/bin/lxc-monitord
>>> > root 29997 0.0 0.0 15000 744 ? Ss 23:47 0:00
>>> > /usr/bin/lxc-monitord /var/lib/lxc 5
>>> > [caglar at qgq:~/Project/lxc/examples] date
>>> > Fri May 3 23:57:52 EDT 2013
>>> >
>>> > * And lastly here is what lxc-monitord.log shows
>>> >
>>> > [caglar at qgq:~/Project/lxc(clone)] tail
>>> > -f /var/lib/lxc/lxc-monitord.log lxc-monitord 1367639242.631 NOTICE
>>> > lxc_monitord - monitoring lxcpath /var/lib/lxc
>>> > lxc-monitord 1367639242.633 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.633 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.636 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.639 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.643 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.643 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.651 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.654 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.665 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.678 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.681 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.681 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.682 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.707 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.710 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.710 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.722 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.733 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639242.831 NOTICE lxc_monitord - create failed
>>> > lxc-monitord 1367639274.071 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639323.928 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639372.862 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639444.107 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639474.130 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639504.133 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639534.161 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639564.190 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639594.209 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639624.223 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639654.256 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639684.287 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639714.317 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639744.347 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639774.370 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639804.396 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639834.426 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639864.456 NOTICE lxc_monitord - clients 9
>>> > lxc-monitord 1367639894.486 NOTICE lxc_monitord - clients 9
>>>
>>> You might want to consider patching the log stuff to print out pids, I
>>> found that helpful while working on this:
>>>
>>
>> Will patch both monitord.c and log.c and report back. Thanks!
>>
>>
>>> diff --git a/src/lxc/log.c b/src/lxc/log.c
>>> index d49a544..98581c1 100644
>>> --- a/src/lxc/log.c
>>> +++ b/src/lxc/log.c
>>> @@ -58,7 +58,7 @@ static int log_append_stderr(const struct
>>> lxc_log_appender *appender,
>>> if (event->priority < LXC_LOG_PRIORITY_ERROR)
>>> return 0;
>>>
>>> - fprintf(stderr, "%s: ", log_prefix);
>>> + fprintf(stderr, "%-5d %s: ", getpid(), log_prefix);
>>> vfprintf(stderr, event->fmt, *event->vap);
>>> fprintf(stderr, "\n");
>>> return 0;
>>> @@ -75,7 +75,8 @@ static int log_append_logfile(const struct
>>> lxc_log_appender *appender,
>>> return 0;
>>>
>>> n = snprintf(buffer, sizeof(buffer),
>>> - "%15s %10ld.%03ld %-8s %s - ",
>>> + "%-5d %15s %10ld.%03ld %-8s %s - ",
>>> + getpid(),
>>> log_prefix,
>>> event->timestamp.tv_sec,
>>> event->timestamp.tv_usec / 1000,
>>>
>>>
>>> > On Fri, Apr 26, 2013 at 4:52 PM, S.Çağlar Onur <caglar at 10ur.org>
>>> > wrote:
>>> >
>>> > > Yeah, I think you all correct and I'm just confused - probably
>>> > > direct effect of lack of caffeine. And no, it's not complicating
>>> > > something for me, it's working great. I just want to make sure that
>>> > > I'm wrong :)
>>> > >
>>> > >
>>> > > On Fri, Apr 26, 2013 at 4:37 PM, Dwight Engen
>>> > > <dwight.engen at oracle.com>wrote:
>>> > >
>>> > >> On Fri, 26 Apr 2013 22:07:22 +0200
>>> > >> Stéphane Graber <stgraber at ubuntu.com> wrote:
>>> > >>
>>> > >> > On 04/26/2013 09:42 PM, S.Çağlar Onur wrote:
>>> > >> > > Hey Dwight,
>>> > >> > >
>>> > >> > > I'm observing following behavior with staging tree and just
>>> > >> > > wanted to make sure that what I'm seeing is the expected;
>>> > >> > >
>>> > >> > > * Initially nothing runs
>>> > >> > >
>>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
>>> > >> > > bankai (STOPPED)
>>> > >> > > bleach (STOPPED)
>>> > >> > > zangetsu (STOPPED)
>>> > >> > >
>>> > >> > > * I start one container using the API
>>> > >> > >
>>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./start -name
>>> > >> > > zangetsu Starting the container...
>>> > >> > >
>>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
>>> > >> > > bankai (STOPPED)
>>> > >> > > bleach (STOPPED)
>>> > >> > > zangetsu (RUNNING)
>>> > >> > >
>>> > >> > > * monitord starts as expected but exits after 30 seconds later
>>> > >> > > (although container is still running);
>>> > >> > >
>>> > >> > > [caglar at qgq:~/Projects/lxc-upstream(staging)] tail -f
>>> > >> > > /var/lib/lxc/lxc-monitord.log
>>> > >> > > lxc-monitord 1367004858.616 NOTICE lxc_monitord -
>>> > >> > > monitoring lxcpath /var/lib/lxc
>>> > >> > > lxc-monitord 1367004888.677 NOTICE lxc_monitord - no
>>> > >> > > clients for 30 seconds, exiting
>>> > >> > > lxc-monitord 1367004888.677 NOTICE lxc_monitord - monitor
>>> > >> > > exiting
>>> > >> > >
>>> > >> > > [caglar at qgq:~/Projects/lxc/examples] sudo ./list
>>> > >> > > bankai (STOPPED)
>>> > >> > > bleach (STOPPED)
>>> > >> > > zangetsu (RUNNING)
>>> > >> > >
>>> > >> > > [caglar at qgq:~/Projects/lxc/examples] ps aux | grep monitord
>>> > >> > > caglar 28404 0.0 0.0 7240 624 pts/54 S+ 15:34
>>> > >> > > 0:00 tail -f /var/lib/lxc/lxc-monitord.log
>>> > >> > > caglar 29037 0.0 0.0 9436 948 pts/0 S+ 15:38
>>> > >> > > 0:00 grep --color=auto monitord
>>> > >> > > [caglar at qgq:~/Projects/lxc/examples]
>>> > >> > >
>>> > >> > > I'm asking cause I was under the impression that lxc-monitord
>>> > >> > > will keep running as long as there is a container. Am I wrong?
>>> > >> >
>>> > >> > I believe the monitor will get spawned the first time something
>>> > >> > needs it (lxc-monitor/lxc-wait) and exit 30s after the last
>>> > >> > client disconnects. It'll then be respawned the next time
>>> > >> > lxc-monitor or lxc-wait is started again that container.
>>> > >>
>>> > >> Yep Stéphane, that is correct. Also note that the monitord is per
>>> > >> lxcpath, not per container.
>>> > >>
>>> > >> Çağlar, you may have been slightly confused because if you start a
>>> > >> container in daemon mode through the API, the API does an internal
>>> > >> lxc_wait() and thus a monitord will get spawned when you first
>>> > >> start a container, but will go away ~30 seconds afterwards.
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > S.Çağlar Onur <caglar at 10ur.org>
>>> > >
>>> >
>>> >
>>> >
>>>
>>>
>>
>>
>> --
>> S.Çağlar Onur <caglar at 10ur.org>
>>
>
>
>
> --
> S.Çağlar Onur <caglar at 10ur.org>
>
--
S.Çağlar Onur <caglar at 10ur.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20130504/620f4cc1/attachment.html>
More information about the lxc-devel
mailing list