[lxc-devel] lxc-1.0.0 reboot error

Stéphane Graber stgraber at ubuntu.com
Wed Feb 26 20:02:13 UTC 2014


On Wed, Feb 26, 2014 at 09:19:27PM +0400, Vitaly Lavrov wrote:
> On 26.02.2014 19:41, Serge Hallyn wrote:
> >Quoting Vitaly Lavrov (vel21ripn at gmail.com):
> >>On 25.02.2014 22:54, Serge Hallyn wrote:
> >>>Quoting Vitaly Lavrov (vel21ripn at gmail.com):
> >>>>On 23.02.2014 03:36, Stéphane Graber wrote:
> >>>>>Hi,
> >>>>>
> >>>>>Thanks for your patch.
> >>>>>
> >>>>>Can I just ask you to sign it off? (Signed-off-by: Name <email>)
> >>>>Hi!
> >>>>
> >>>>I found the source of the problem with a reboot of the container, but do not know how best to fix it.
> >>>>We have a race condition between the end of the old container and the creation of the network interfaces
> >>>>in the new container. Insert usleep (100000) before lxc_delete_network() solves the problem with a reboot,
> >>>>but it's a bad way.
> >>>>
> >>>>How to wait until the completion of the container?
> >>>
> >>>How exactly are you doing the test?  just script
> >>>
> >>>	lxc-start;
> >>>	lxc-stop;
> >>>	lxc-start;
> >>lxc-stop -rn container
> >
> >A-ha!  Thanks.  Yes, this is a bug in our reboot handling in
> >lxcapi_start().  I can reproduce it trivially with lxc-stop -r
> >on any container with lxc.network.type = phys.
> 
> "lxc.network.type = phys" has another bug
> 
> *** glibc detected *** lxc-start: realloc(): invalid pointer: 0x0948eed0 ***
> ======= Backtrace: =========
> /lib/libc.so.6(+0x7710b)[0xb756d10b]
> /lib/libc.so.6(realloc+0x2c5)[0xb75720b5]
> /usr/lib/liblxc.so.1(__lxc_start+0x5d2)[0xb76c0c12]
> /usr/lib/liblxc.so.1(lxc_start+0x4c)[0xb76c15ac]
> /usr/lib/liblxc.so.1(+0x42a2c)[0xb76eaa2c]
> lxc-start(main+0x267)[0x8048e07]
> /lib/libc.so.6(__libc_start_main+0xf5)[0xb750f5a5]
> lxc-start[0x8049245]
> ======= Memory map: ========
> 
> src/lxc/start.c:753 save_phys_nics()
> -----------------------------------------------------------------------
> 	conf->saved_nics = realloc(conf->saved_nics,
> 		(conf->num_savednics+1)*sizeof(struct saved_nic));
> -----------------------------------------------------------------------
> 
> The patch is simple.
> 
> --- src/lxc/conf.c.orig 2014-02-26 13:21:40.263953511 +0400
> +++ src/lxc/conf.c      2014-02-26 20:39:46.710074311 +0400
> @@ -2606,6 +2606,7 @@ void lxc_rename_phys_nics_on_shutdown(st
>         }
>         conf->num_savednics = 0;
>         free(conf->saved_nics);
> +       conf->saved_nics = NULL;
>  }
> 
>  static char *default_rootfs_mount = LXCROOTFSMOUNT;
> @@ -4119,8 +4120,8 @@ static void lxc_clear_saved_nics(struct
>                 return;
>         for (i=0; i < conf->num_savednics; i++)
>                 free(conf->saved_nics[i].orig_name);
> -       conf->saved_nics = 0;
>         free(conf->saved_nics);
> +       conf->saved_nics = NULL;
>  }
> 
>  void lxc_conf_free(struct lxc_conf *conf)
> --

That patch looks reasonable to me, can you send it separately to the
mailing-list including a commit message and Signed-off-by line so I can
already apply that one to master?

Thanks!

> 
> 
> But there is a more difficult problem.
> 
> Function lxc_rename_phys_nics_on_shutdown() does not always work as it should.
> 
> ------
> lxc-start 1393409939.368 INFO     lxc_conf - running to reset 1 nic names
> lxc-start 1393409939.368 WARN     lxc_conf - resetting nic 3 to eth2 failed: No such device
> ------
> 
> I added a wait loop and debug printing and that's what got:
> -----
> lxc-start 1393433485.531 INFO     lxc_conf - running to reset 1 nic names
> lxc-start 1393433485.532 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 1ms
> lxc-start 1393433485.533 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 2ms
> lxc-start 1393433485.534 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 3ms
> lxc-start 1393433485.536 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 4ms
> lxc-start 1393433485.537 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 5ms
> lxc-start 1393433485.538 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 6ms
> lxc-start 1393433485.539 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 7ms
> lxc-start 1393433485.540 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 8ms
> lxc-start 1393433485.541 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 9ms
> lxc-start 1393433485.542 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 10ms
> lxc-start 1393433485.543 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 11ms
> lxc-start 1393433485.544 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 12ms
> lxc-start 1393433485.545 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 13ms
> lxc-start 1393433485.562 INFO     lxc_conf - resetting nic 3 to eth2, delay 14ms
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> If you wait 12-20 ms renaming network interface works.
> The same problem with vlan interface.
> 
> 
> >_______________________________________________
> >lxc-devel mailing list
> >lxc-devel at lists.linuxcontainers.org
> >http://lists.linuxcontainers.org/listinfo/lxc-devel
> >
> 
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20140226/ca5803fe/attachment.pgp>


More information about the lxc-devel mailing list