[lxc-devel] lxc-1.0.0 reboot error
Vitaly Lavrov
vel21ripn at gmail.com
Wed Feb 26 17:19:27 UTC 2014
On 26.02.2014 19:41, Serge Hallyn wrote:
> Quoting Vitaly Lavrov (vel21ripn at gmail.com):
>> On 25.02.2014 22:54, Serge Hallyn wrote:
>>> Quoting Vitaly Lavrov (vel21ripn at gmail.com):
>>>> On 23.02.2014 03:36, Stéphane Graber wrote:
>>>>> Hi,
>>>>>
>>>>> Thanks for your patch.
>>>>>
>>>>> Can I just ask you to sign it off? (Signed-off-by: Name <email>)
>>>> Hi!
>>>>
>>>> I found the source of the problem with a reboot of the container, but do not know how best to fix it.
>>>> We have a race condition between the end of the old container and the creation of the network interfaces
>>>> in the new container. Insert usleep (100000) before lxc_delete_network() solves the problem with a reboot,
>>>> but it's a bad way.
>>>>
>>>> How to wait until the completion of the container?
>>>
>>> How exactly are you doing the test? just script
>>>
>>> lxc-start;
>>> lxc-stop;
>>> lxc-start;
>> lxc-stop -rn container
>
> A-ha! Thanks. Yes, this is a bug in our reboot handling in
> lxcapi_start(). I can reproduce it trivially with lxc-stop -r
> on any container with lxc.network.type = phys.
"lxc.network.type = phys" has another bug
*** glibc detected *** lxc-start: realloc(): invalid pointer: 0x0948eed0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x7710b)[0xb756d10b]
/lib/libc.so.6(realloc+0x2c5)[0xb75720b5]
/usr/lib/liblxc.so.1(__lxc_start+0x5d2)[0xb76c0c12]
/usr/lib/liblxc.so.1(lxc_start+0x4c)[0xb76c15ac]
/usr/lib/liblxc.so.1(+0x42a2c)[0xb76eaa2c]
lxc-start(main+0x267)[0x8048e07]
/lib/libc.so.6(__libc_start_main+0xf5)[0xb750f5a5]
lxc-start[0x8049245]
======= Memory map: ========
src/lxc/start.c:753 save_phys_nics()
-----------------------------------------------------------------------
conf->saved_nics = realloc(conf->saved_nics,
(conf->num_savednics+1)*sizeof(struct saved_nic));
-----------------------------------------------------------------------
The patch is simple.
--- src/lxc/conf.c.orig 2014-02-26 13:21:40.263953511 +0400
+++ src/lxc/conf.c 2014-02-26 20:39:46.710074311 +0400
@@ -2606,6 +2606,7 @@ void lxc_rename_phys_nics_on_shutdown(st
}
conf->num_savednics = 0;
free(conf->saved_nics);
+ conf->saved_nics = NULL;
}
static char *default_rootfs_mount = LXCROOTFSMOUNT;
@@ -4119,8 +4120,8 @@ static void lxc_clear_saved_nics(struct
return;
for (i=0; i < conf->num_savednics; i++)
free(conf->saved_nics[i].orig_name);
- conf->saved_nics = 0;
free(conf->saved_nics);
+ conf->saved_nics = NULL;
}
void lxc_conf_free(struct lxc_conf *conf)
--
But there is a more difficult problem.
Function lxc_rename_phys_nics_on_shutdown() does not always work as it should.
------
lxc-start 1393409939.368 INFO lxc_conf - running to reset 1 nic names
lxc-start 1393409939.368 WARN lxc_conf - resetting nic 3 to eth2 failed: No such device
------
I added a wait loop and debug printing and that's what got:
-----
lxc-start 1393433485.531 INFO lxc_conf - running to reset 1 nic names
lxc-start 1393433485.532 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 1ms
lxc-start 1393433485.533 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 2ms
lxc-start 1393433485.534 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 3ms
lxc-start 1393433485.536 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 4ms
lxc-start 1393433485.537 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 5ms
lxc-start 1393433485.538 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 6ms
lxc-start 1393433485.539 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 7ms
lxc-start 1393433485.540 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 8ms
lxc-start 1393433485.541 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 9ms
lxc-start 1393433485.542 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 10ms
lxc-start 1393433485.543 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 11ms
lxc-start 1393433485.544 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 12ms
lxc-start 1393433485.545 WARN lxc_conf - resetting nic 3 to eth2 failed: 'No such device', delay 13ms
lxc-start 1393433485.562 INFO lxc_conf - resetting nic 3 to eth2, delay 14ms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you wait 12-20 ms renaming network interface works.
The same problem with vlan interface.
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel
>
More information about the lxc-devel
mailing list