[Lxc-users] concurrent aptitude/dpkg runs in separate containers --> bork bork bork

Daniel Lezcano daniel.lezcano at free.fr
Thu Feb 3 21:46:00 UTC 2011


On 02/03/2011 07:08 AM, Trent W. Buck wrote:
> twb at cybersource.com.au (Trent W. Buck)
> writes:
>
>> I'm being a bit more patient than last time, and I think they ARE
>> proceeding, just REALLY slowly.  Meanwhile aptitude consumes a 100% of a
>> core busy-waiting for a response from dpkg :-/
>>
>> They look like this:
>>
>>      $ ssh omega cat /proc/7713/stack
>>      Warning: Permanently added 'omega,192.168.155.22' (RSA) to the list of known hosts.
>>      [<ffffffff811669b7>] sync_inodes_sb+0x87/0xb0
>>      [<ffffffff8116b292>] __sync_filesystem+0x82/0x90
>>      [<ffffffff8116b379>] sync_filesystems+0xd9/0x130
>>      [<ffffffff8116b431>] sys_sync+0x21/0x40
>>      [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
>>      [<ffffffffffffffff>] 0xffffffffffffffff
>>
>>      $ ssh omega cat /proc/5619/stack
>>      Warning: Permanently added 'omega,192.168.155.22' (RSA) to the list of known hosts.
>>      [<ffffffff81222865>] jbd2_log_wait_commit+0xc5/0x150
>>      [<ffffffff811d7a2c>] ext4_sync_file+0x13c/0x2e0
>>      [<ffffffff8116b051>] vfs_fsync_range+0xa1/0xe0
>>      [<ffffffff8116b0fd>] vfs_fsync+0x1d/0x20
>>      [<ffffffff8116b13e>] do_fsync+0x3e/0x60
>>      [<ffffffff8116b190>] sys_fsync+0x10/0x20
>>      [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
>>      [<ffffffffffffffff>] 0xffffffffffffffff
> And here's one that is well and truly wedged:
>
>      root at omega:~# cat /proc/31430/stack
>      [<ffffffff811669b7>] sync_inodes_sb+0x87/0xb0
>      [<ffffffff8116b292>] __sync_filesystem+0x82/0x90
>      [<ffffffff8116b379>] sync_filesystems+0xd9/0x130
>      [<ffffffff8116b431>] sys_sync+0x21/0x40
>      [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
>      [<ffffffffffffffff>] 0xffffffffffffffff
>
> In that case, even kill -SEGV'ing upstart won't stop it.  I got that
> with only a single dpkg run (i.e. no concurrency), after switching the
> container's rootfs from ext4 to ext3, and forcing dpkg[0] to be upgraded
> before anything else.  Sigh...
>
> I'm THIS CLOSE to giving up and wrapping apt-get in libeatmydata.
>
> [0] I did this because I noticed that lucid's dpkg still suffers from
>
>        http://bugs.debian.org/578635
>        http://bugs.debian.org/605009
>        https://launchpad.net/bugs/570805
>
>      But lucid-updates&  lucid-security both contain a version that
>      contains CLAIMS to address the first of those.

Ouch !


Assuming you have an ubuntu version on your host, I think the kernel is 
compiled with DETECT_HUNG_TASK, where a kernel stack trace is displayed 
if a task stays in the 'D' state indefinitively. Do you have such stack 
on your logs ?







More information about the lxc-users mailing list