[Lxc-users] cannot start any more any container?! (partially solved)
    Jérôme Petazzoni 
    jerome.petazzoni at dotcloud.com
       
    Tue Dec 13 01:30:27 UTC 2011
    
    
  
Hi,
(I'm sorry if this message is not properly attached to its original 
thread; I found it on a mailing list archive while investigating a very 
similar bug, and joined the mailing list afterwards!)
We are experiencing symptoms very similar to those described  by Ulli 
Horlacher:
- after a while, containers won't start anymore
- lxc-start then remains stuck in "uninterruptible sleep" state and is 
unkillable
- the only way we found so far to solve the problem is to reboot the machine
Daniel Lezcano said:
> The problem you are describing is not related to LXC but to the network
> namespace where a dangling reference in the kernel with ipv6 locks the
> network devices. When the kernel hits this bug, any process creating a
> network device or deleting one will be stuck in an uninterruptible state.
>
> If you are able to start a container with an ipv6 address
> (lxc.network.ipv6=xxx), stop it, and start it again 10 seconds later
> then that means the bug is solved in the kernel.
We do not have any reference to ipv6 in the container configuration files.
Does this mean that we should be immune to the bug?
Or is the ipv6 trick just a way to reproduce the bug with 100% accuracy?
> The key point is what Serge said, if you have this message in your console:
>
> "kernel: unregister_netdevice: waiting for ... to become free"
>
> then this is a kernel bug.
dmesg does not show that message, unfortunately.
> If you still have this problem with 2.6.38, please let me know,
... And we are running 2.6.38.
I think that the problem never appeared with lxc 0.7.4; it looks like it 
started to occur with 0.7.5 (but this is only happening randomly, so we 
can't be 100% sure).
Any advice or idea will be more than welcome!
Best regards,
    
    
More information about the lxc-users
mailing list