[Lxc-users] unstoppable container

Daniel Lezcano daniel.lezcano at free.fr
Tue Aug 31 13:25:30 UTC 2010


On 08/31/2010 12:07 PM, Papp Tamás wrote:
>
> Serge E. Hallyn wrote, On 2010. 08. 31. 4:06:
>> Quoting Daniel Lezcano (daniel.lezcano at free.fr):
>>> On 08/31/2010 12:23 AM, Serge E. Hallyn wrote:
>>>> Quoting Daniel Lezcano (daniel.lezcano at free.fr):
>>>>> On 08/30/2010 02:36 PM, Serge E. Hallyn wrote:
>>>>>> Quoting Papp Tamás (tompos at martos.bme.hu):
>>>>>>> Daniel Lezcano wrote, On 2010. 08. 30. 13:08:
>>>>>>>> Usually, there is a mechanism used in lxc to kill -9 the 
>>>>>>>> process 1 of
>>>>>>>> the container (which wipes out all the processes of the 
>>>>>>>> containers)
>>>>>>>> when lxc-start dies.
>>>>>>> It should wipe out them, but in my case it was unsuccessfull, 
>>>>>>> even if I
>>>>>>> killed the init process by hand.
>>>>>>>
>>>>>>>> So if you still have the processes running inside the container 
>>>>>>>> but
>>>>>>>> lxc-start is dead, then:
>>>>>>>>   * you are using a 2.6.32 kernel which is buggy (this 
>>>>>>>> mechanism is
>>>>>>>> broken).
>>>>>>> Ubuntu 10.04, so it's exactly the point, the kernel is 2.6.32 .
>>>>>>>
>>>>>>>
>>>>>>> Could you point me (or the Ubuntu guy in the list) to an URL, which
>>>>>>> describes the problem or maybe to the kernel patch. If it's 
>>>>>>> possible,
>>>>>>> maybe the Ubuntu kernel maintainers would fix the official 
>>>>>>> Ubuntu kernel.
>>>>>> Daniel,
>>>>>>
>>>>>> which patch are you talking about?  (presumably a patch against
>>>>>> zap_pid_ns_processes()?)  If it's keeping containers from properly
>>>>>> shutting down, we may be able to SRU a small enough patch, but if
>>>>>> it involves a whole Oleg rewrite then maybe not :)
>>>>> I am referring to these ones:
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=13aa9a6b0f2371d2ce0de57c2ede62ab7a787157 
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=dd34200adc01c5217ef09b55905b5c2312d65535 
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=dd34200adc01c5217ef09b55905b5c2312d65535 
>>>>>
>>>> (note, second and third are identical - did you mean to paste 2 or 
>>>> 3 links?
>>> 3 links, was this one.
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=614c517d7c00af1b26ded20646b329397d6f51a1 
>>>
>>
>> Ah, thanks.
>>
>> I had a feeling the second one depended on defining si_fromuser in all
>> lowercase, but for some reason git wasn't showing that one to me easily.
>>
>>>>> Are they small enough for a SRU ?
>>>> The first one looks trivial enough.  I'd be afraid the second one 
>>>> would be
>>>> considered to have deep and subtle regression potential.  But, we can
>>>> always try.  I'm not on the kernel team so am not likely to have 
>>>> any say
>>>> on it myself :)
>>> Shall we ask directly to the kernel-team@ mailing list ? Or do we
>>> have to do a SRU first ?
>>
>> Actually, first step would be for Papp to open a bug against both
>> lxc and the kernel.  Papp, do you mind doing that?
>>
>> Without a bug, an SRU ain't gonna fly.
>
> Sure I can do this. What should I write in the report exactly and what 
> is the correct email address I write to?
>
> - kernel version (2.6.32.x)
> - system (Ubuntu)
> - container was unstoppable(?) even if there were no processess

> - the way I was successful

IMO, we should keep it simple as we can not reproduce the bug you had yet.

"The container's processes are not killed when the container init parent 
dies.
This mechanism relies on prctl(PR_SET_PDEATHSIG, ...), which works fine 
for all kernel version except for 2.6.32.
The bug was reported and fixed.

https://lists.linux-foundation.org/pipermail/containers/2009-October/021052.html

please note, there is a simple test program spotting the bug.

Is it possible to backport this fix in 2.6.32 ?"

Well something like that :)

> - ...and?

I think you have to create a launchpad profile and open a bug.





More information about the lxc-users mailing list