[Lxc-users] concurrent aptitude/dpkg runs in separate containers --> bork bork bork
Daniel Lezcano
daniel.lezcano at free.fr
Thu Feb 3 21:46:00 UTC 2011
On 02/03/2011 07:08 AM, Trent W. Buck wrote:
> twb at cybersource.com.au (Trent W. Buck)
> writes:
>
>> I'm being a bit more patient than last time, and I think they ARE
>> proceeding, just REALLY slowly. Meanwhile aptitude consumes a 100% of a
>> core busy-waiting for a response from dpkg :-/
>>
>> They look like this:
>>
>> $ ssh omega cat /proc/7713/stack
>> Warning: Permanently added 'omega,192.168.155.22' (RSA) to the list of known hosts.
>> [<ffffffff811669b7>] sync_inodes_sb+0x87/0xb0
>> [<ffffffff8116b292>] __sync_filesystem+0x82/0x90
>> [<ffffffff8116b379>] sync_filesystems+0xd9/0x130
>> [<ffffffff8116b431>] sys_sync+0x21/0x40
>> [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> $ ssh omega cat /proc/5619/stack
>> Warning: Permanently added 'omega,192.168.155.22' (RSA) to the list of known hosts.
>> [<ffffffff81222865>] jbd2_log_wait_commit+0xc5/0x150
>> [<ffffffff811d7a2c>] ext4_sync_file+0x13c/0x2e0
>> [<ffffffff8116b051>] vfs_fsync_range+0xa1/0xe0
>> [<ffffffff8116b0fd>] vfs_fsync+0x1d/0x20
>> [<ffffffff8116b13e>] do_fsync+0x3e/0x60
>> [<ffffffff8116b190>] sys_fsync+0x10/0x20
>> [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
>> [<ffffffffffffffff>] 0xffffffffffffffff
> And here's one that is well and truly wedged:
>
> root at omega:~# cat /proc/31430/stack
> [<ffffffff811669b7>] sync_inodes_sb+0x87/0xb0
> [<ffffffff8116b292>] __sync_filesystem+0x82/0x90
> [<ffffffff8116b379>] sync_filesystems+0xd9/0x130
> [<ffffffff8116b431>] sys_sync+0x21/0x40
> [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> In that case, even kill -SEGV'ing upstart won't stop it. I got that
> with only a single dpkg run (i.e. no concurrency), after switching the
> container's rootfs from ext4 to ext3, and forcing dpkg[0] to be upgraded
> before anything else. Sigh...
>
> I'm THIS CLOSE to giving up and wrapping apt-get in libeatmydata.
>
> [0] I did this because I noticed that lucid's dpkg still suffers from
>
> http://bugs.debian.org/578635
> http://bugs.debian.org/605009
> https://launchpad.net/bugs/570805
>
> But lucid-updates& lucid-security both contain a version that
> contains CLAIMS to address the first of those.
Ouch !
Assuming you have an ubuntu version on your host, I think the kernel is
compiled with DETECT_HUNG_TASK, where a kernel stack trace is displayed
if a task stays in the 'D' state indefinitively. Do you have such stack
on your logs ?
More information about the lxc-users
mailing list