[Lxc-users] unstoppable container

Wed Sep 1 05:43:50 UTC 2010

Serge E. Hallyn wrote, On 2010. 08. 31. 15:26:
> Quoting Papp Tamás (tompos at martos.bme.hu):
>   
>> Serge E. Hallyn wrote, On 2010. 08. 31. 4:06:
>>     
>>> Quoting Daniel Lezcano (daniel.lezcano at free.fr):
>>>       
>>>> On 08/31/2010 12:23 AM, Serge E. Hallyn wrote:
>>>>         
>>>>> Quoting Daniel Lezcano (daniel.lezcano at free.fr):
>>>>>           
>>>>>> On 08/30/2010 02:36 PM, Serge E. Hallyn wrote:
>>>>>>             
>>>>>>> Quoting Papp Tamás (tompos at martos.bme.hu):
>>>>>>>               
>>>>>>>> Daniel Lezcano wrote, On 2010. 08. 30. 13:08:
>>>>>>>>                 
>>>>>>>>> Usually, there is a mechanism used in lxc to kill -9 the process 1 of
>>>>>>>>> the container (which wipes out all the processes of the containers)
>>>>>>>>> when lxc-start dies.
>>>>>>>>>                   
>>>>>>>> It should wipe out them, but in my case it was unsuccessfull, even if I
>>>>>>>> killed the init process by hand.
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> So if you still have the processes running inside the container but
>>>>>>>>> lxc-start is dead, then:
>>>>>>>>>  * you are using a 2.6.32 kernel which is buggy (this mechanism is
>>>>>>>>> broken).
>>>>>>>>>                   
>>>>>>>> Ubuntu 10.04, so it's exactly the point, the kernel is 2.6.32 .
>>>>>>>>
>>>>>>>>
>>>>>>>> Could you point me (or the Ubuntu guy in the list) to an URL, which
>>>>>>>> describes the problem or maybe to the kernel patch. If it's possible,
>>>>>>>> maybe the Ubuntu kernel maintainers would fix the official Ubuntu kernel.
>>>>>>>>                 
>>>>>>> Daniel,
>>>>>>>
>>>>>>> which patch are you talking about?  (presumably a patch against
>>>>>>> zap_pid_ns_processes()?)  If it's keeping containers from properly
>>>>>>> shutting down, we may be able to SRU a small enough patch, but if
>>>>>>> it involves a whole Oleg rewrite then maybe not :)
>>>>>>>               
>>>>>> I am referring to these ones:
>>>>>>
>>>>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=13aa9a6b0f2371d2ce0de57c2ede62ab7a787157
>>>>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=dd34200adc01c5217ef09b55905b5c2312d65535
>>>>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=dd34200adc01c5217ef09b55905b5c2312d65535
>>>>>>             
>>>>> (note, second and third are identical - did you mean to paste 2 or 3 links?
>>>>>           
>>>> 3 links, was this one.
>>>>
>>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=614c517d7c00af1b26ded20646b329397d6f51a1
>>>>         
>>> Ah, thanks.
>>>
>>> I had a feeling the second one depended on defining si_fromuser in all
>>> lowercase, but for some reason git wasn't showing that one to me easily.
>>>
>>>       
>>>>>> Are they small enough for a SRU ?
>>>>>>             
>>>>> The first one looks trivial enough.  I'd be afraid the second one would be
>>>>> considered to have deep and subtle regression potential.  But, we can
>>>>> always try.  I'm not on the kernel team so am not likely to have any say
>>>>> on it myself :)
>>>>>           
>>>> Shall we ask directly to the kernel-team@ mailing list ? Or do we
>>>> have to do a SRU first ?
>>>>         
>>> Actually, first step would be for Papp to open a bug against both
>>> lxc and the kernel.  Papp, do you mind doing that?
>>>
>>> Without a bug, an SRU ain't gonna fly.
>>>       
>> Sure I can do this. What should I write in the report exactly and
>> what is the correct email address I write to?
>>
>> - kernel version (2.6.32.x)
>> - system (Ubuntu)
>>     
>
> and that it's an uptodate lucid.
>
>   
>> - container was unstoppable(?) even if there were no processess
>> - the way I was successful
>> - ...and?
>>     
>
> A recipe to reproduce the bug.  It has to be reproducible.  Then
> I'll run the recipe and when I see the failure, I'll confirm the
> bug (which a separate second person needs to do).
>
>   

Today I will give a try to reproduce it.

tamas