[lxc-devel] lxc-1.0.0 reboot error
Stéphane Graber
stgraber at ubuntu.com
Wed Feb 26 20:02:13 UTC 2014
On Wed, Feb 26, 2014 at 09:19:27PM +0400, Vitaly Lavrov wrote:
> On 26.02.2014 19:41, Serge Hallyn wrote:
> >Quoting Vitaly Lavrov (vel21ripn at gmail.com):
> >>On 25.02.2014 22:54, Serge Hallyn wrote:
> >>>Quoting Vitaly Lavrov (vel21ripn at gmail.com):
> >>>>On 23.02.2014 03:36, Stéphane Graber wrote:
> >>>>>Hi,
> >>>>>
> >>>>>Thanks for your patch.
> >>>>>
> >>>>>Can I just ask you to sign it off? (Signed-off-by: Name <email>)
> >>>>Hi!
> >>>>
> >>>>I found the source of the problem with a reboot of the container, but do not know how best to fix it.
> >>>>We have a race condition between the end of the old container and the creation of the network interfaces
> >>>>in the new container. Insert usleep (100000) before lxc_delete_network() solves the problem with a reboot,
> >>>>but it's a bad way.
> >>>>
> >>>>How to wait until the completion of the container?
> >>>
> >>>How exactly are you doing the test? just script
> >>>
> >>> lxc-start;
> >>> lxc-stop;
> >>> lxc-start;
> >>lxc-stop -rn container
> >
> >A-ha! Thanks. Yes, this is a bug in our reboot handling in
> >lxcapi_start(). I can reproduce it trivially with lxc-stop -r
> >on any container with lxc.network.type = phys.
>
> "lxc.network.type = phys" has another bug
>
> *** glibc detected *** lxc-start: realloc(): invalid pointer: 0x0948eed0 ***
> ======= Backtrace: =========
> /lib/libc.so.6(+0x7710b)[0xb756d10b]
> /lib/libc.so.6(realloc+0x2c5)[0xb75720b5]
> /usr/lib/liblxc.so.1(__lxc_start+0x5d2)[0xb76c0c12]
> /usr/lib/liblxc.so.1(lxc_start+0x4c)[0xb76c15ac]
> /usr/lib/liblxc.so.1(+0x42a2c)[0xb76eaa2c]
> lxc-start(main+0x267)[0x8048e07]
> /lib/libc.so.6(__libc_start_main+0xf5)[0xb750f5a5]
> lxc-start[0x8049245]
> ======= Memory map: ========
>
> src/lxc/start.c:753 save_phys_nics()
> -----------------------------------------------------------------------
> conf->saved_nics = realloc(conf->saved_nics,
> (conf->num_savednics+1)*sizeof(struct saved_nic));
> -----------------------------------------------------------------------
>
> The patch is simple.
>
> --- src/lxc/conf.c.orig 2014-02-26 13:21:40.263953511 +0400
> +++ src/lxc/conf.c 2014-02-26 20:39:46.710074311 +0400
> @@ -2606,6 +2606,7 @@ void lxc_rename_phys_nics_on_shutdown(st
> }
> conf->num_savednics = 0;
> free(conf->saved_nics);
> + conf->saved_nics = NULL;
> }
>
> static char *default_rootfs_mount = LXCROOTFSMOUNT;
> @@ -4119,8 +4120,8 @@ static void lxc_clear_saved_nics(struct
> return;
> for (i=0; i < conf->num_savednics; i++)
> free(conf->saved_nics[i].orig_name);
> - conf->saved_nics = 0;
> free(conf->saved_nics);
> + conf->saved_nics = NULL;
> }
>
> void lxc_conf_free(struct lxc_conf *conf)
> --
That patch looks reasonable to me, can you send it separately to the
mailing-list including a commit message and Signed-off-by line so I can
already apply that one to master?
Thanks!
>
>
> But there is a more difficult problem.
>
> Function lxc_rename_phys_nics_on_shutdown() does not always work as it should.
>
> ------
> lxc-start 1393409939.368 INFO lxc_conf - running to reset 1 nic names
> lxc-start 1393409939.368 WARN lxc_conf - resetting nic 3 to eth2 failed: No such device
> ------
>
> I added a wait loop and debug printing and that's what got:
> -----
> lxc-start 1393433485.531 INFO lxc_conf - running to reset 1 nic names
> lxc-start 1393433485.532 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 1ms
> lxc-start 1393433485.533 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 2ms
> lxc-start 1393433485.534 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 3ms
> lxc-start 1393433485.536 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 4ms
> lxc-start 1393433485.537 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 5ms
> lxc-start 1393433485.538 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 6ms
> lxc-start 1393433485.539 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 7ms
> lxc-start 1393433485.540 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 8ms
> lxc-start 1393433485.541 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 9ms
> lxc-start 1393433485.542 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 10ms
> lxc-start 1393433485.543 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 11ms
> lxc-start 1393433485.544 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 12ms
> lxc-start 1393433485.545 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 13ms
> lxc-start 1393433485.562 INFO lxc_conf - resetting nic 3 to eth2, delay 14ms
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> If you wait 12-20 ms renaming network interface works.
> The same problem with vlan interface.
>
>
> >_______________________________________________
> >lxc-devel mailing list
> >lxc-devel at lists.linuxcontainers.org
> >http://lists.linuxcontainers.org/listinfo/lxc-devel
> >
>
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel
--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20140226/ca5803fe/attachment.pgp>
More information about the lxc-devel
mailing list