[lxc-devel] lxc-1.0.0 reboot error

Vitaly Lavrov vel21ripn at gmail.com
Wed Feb 26 17:19:27 UTC 2014


On 26.02.2014 19:41, Serge Hallyn wrote:
> Quoting Vitaly Lavrov (vel21ripn at gmail.com):
>> On 25.02.2014 22:54, Serge Hallyn wrote:
>>> Quoting Vitaly Lavrov (vel21ripn at gmail.com):
>>>> On 23.02.2014 03:36, Stéphane Graber wrote:
>>>>> Hi,
>>>>>
>>>>> Thanks for your patch.
>>>>>
>>>>> Can I just ask you to sign it off? (Signed-off-by: Name <email>)
>>>> Hi!
>>>>
>>>> I found the source of the problem with a reboot of the container, but do not know how best to fix it.
>>>> We have a race condition between the end of the old container and the creation of the network interfaces
>>>> in the new container. Insert usleep (100000) before lxc_delete_network() solves the problem with a reboot,
>>>> but it's a bad way.
>>>>
>>>> How to wait until the completion of the container?
>>>
>>> How exactly are you doing the test?  just script
>>>
>>> 	lxc-start;
>>> 	lxc-stop;
>>> 	lxc-start;
>> lxc-stop -rn container
>
> A-ha!  Thanks.  Yes, this is a bug in our reboot handling in
> lxcapi_start().  I can reproduce it trivially with lxc-stop -r
> on any container with lxc.network.type = phys.

"lxc.network.type = phys" has another bug

*** glibc detected *** lxc-start: realloc(): invalid pointer: 0x0948eed0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x7710b)[0xb756d10b]
/lib/libc.so.6(realloc+0x2c5)[0xb75720b5]
/usr/lib/liblxc.so.1(__lxc_start+0x5d2)[0xb76c0c12]
/usr/lib/liblxc.so.1(lxc_start+0x4c)[0xb76c15ac]
/usr/lib/liblxc.so.1(+0x42a2c)[0xb76eaa2c]
lxc-start(main+0x267)[0x8048e07]
/lib/libc.so.6(__libc_start_main+0xf5)[0xb750f5a5]
lxc-start[0x8049245]
======= Memory map: ========

src/lxc/start.c:753 save_phys_nics()
-----------------------------------------------------------------------
	conf->saved_nics = realloc(conf->saved_nics,
		(conf->num_savednics+1)*sizeof(struct saved_nic));
-----------------------------------------------------------------------

The patch is simple.

--- src/lxc/conf.c.orig 2014-02-26 13:21:40.263953511 +0400
+++ src/lxc/conf.c      2014-02-26 20:39:46.710074311 +0400
@@ -2606,6 +2606,7 @@ void lxc_rename_phys_nics_on_shutdown(st
         }
         conf->num_savednics = 0;
         free(conf->saved_nics);
+       conf->saved_nics = NULL;
  }

  static char *default_rootfs_mount = LXCROOTFSMOUNT;
@@ -4119,8 +4120,8 @@ static void lxc_clear_saved_nics(struct
                 return;
         for (i=0; i < conf->num_savednics; i++)
                 free(conf->saved_nics[i].orig_name);
-       conf->saved_nics = 0;
         free(conf->saved_nics);
+       conf->saved_nics = NULL;
  }

  void lxc_conf_free(struct lxc_conf *conf)
--


But there is a more difficult problem.

Function lxc_rename_phys_nics_on_shutdown() does not always work as it should.

------
lxc-start 1393409939.368 INFO     lxc_conf - running to reset 1 nic names
lxc-start 1393409939.368 WARN     lxc_conf - resetting nic 3 to eth2 failed: No such device
------

I added a wait loop and debug printing and that's what got:
-----
lxc-start 1393433485.531 INFO     lxc_conf - running to reset 1 nic names
lxc-start 1393433485.532 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 1ms
lxc-start 1393433485.533 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 2ms
lxc-start 1393433485.534 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 3ms
lxc-start 1393433485.536 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 4ms
lxc-start 1393433485.537 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 5ms
lxc-start 1393433485.538 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 6ms
lxc-start 1393433485.539 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 7ms
lxc-start 1393433485.540 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 8ms
lxc-start 1393433485.541 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 9ms
lxc-start 1393433485.542 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 10ms
lxc-start 1393433485.543 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 11ms
lxc-start 1393433485.544 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 12ms
lxc-start 1393433485.545 WARN     lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 13ms
lxc-start 1393433485.562 INFO     lxc_conf - resetting nic 3 to eth2, delay 14ms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you wait 12-20 ms renaming network interface works.
The same problem with vlan interface.


> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel
>



More information about the lxc-devel mailing list