[Lxc-users] Containers slow to start after 1600
Benoit Lourdelet
blourdel at juniper.net
Tue Mar 12 17:31:32 UTC 2013
Hello Serge,
I am running on a 256MB RAM host, with plenty of free memory.
I issue echo t > /proc/sysrq-trigger when containers was taking 30s to start , it gave the following. Nothing that caught my attention.
this block is repeated of each running container:
[46825.718046] rt_rq[31]:/lxc/lwb2002
[46825.718048] .rt_nr_running : 0
[46825.718050] .rt_throttled : 0
[46825.718052] .rt_time : 0.000000
[46825.718053] .rt_runtime : 0.000000
then :
[46825.718056]
[46825.718056] rt_rq[31]:/lxc
[46825.718059] .rt_nr_running : 0
[46825.718060] .rt_throttled : 0
[46825.718062] .rt_time : 0.000000
[46825.718064] .rt_runtime : 0.000000
[46825.718069]
[46825.718069] rt_rq[31]:/libvirt/lxc
[46825.718071] .rt_nr_running : 0
[46825.718073] .rt_throttled : 0
[46825.718075] .rt_time : 0.000000
[46825.718077] .rt_runtime : 0.000000
[46825.718080]
[46825.718080] rt_rq[31]:/libvirt/qemu
[46825.718083] .rt_nr_running : 0
[46825.718084] .rt_throttled : 0
[46825.718086] .rt_time : 0.000000
[46825.718088] .rt_runtime : 0.000000
[46825.718091]
[46825.718091] rt_rq[31]:/libvirt
[46825.718093] .rt_nr_running : 0
[46825.718095] .rt_throttled : 0
[46825.718097] .rt_time : 0.000000
[46825.718099] .rt_runtime : 0.000000
[46825.718105]
[46825.718105] rt_rq[31]:/
[46825.718107] .rt_nr_running : 0
[46825.718109] .rt_throttled : 0
[46825.718111] .rt_time : 0.000000
[46825.718113] .rt_runtime : 950.000000
[46825.718115]
[46825.718115] runnable tasks:
[46825.718115] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
[46825.718115] ----------------------------------------------------------------------------------------------------------
[46825.727356]
regards
Benoit
root at ieng-serv06:/root/scripts# cat /proc/meminfo
MemTotal: 264124804 kB
MemFree: 234107144 kB
Buffers: 3429676 kB
Cached: 1650712 kB
SwapCached: 0 kB
Active: 10496560 kB
Inactive: 3224732 kB
Active(anon): 8695932 kB
Inactive(anon): 84348 kB
Active(file): 1800628 kB
Inactive(file): 3140384 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 136 kB
Writeback: 0 kB
AnonPages: 8640928 kB
Mapped: 17868 kB
Shmem: 139380 kB
Slab: 10287240 kB
SReclaimable: 5977640 kB
SUnreclaim: 4309600 kB
KernelStack: 312000 kB
PageTables: 1989464 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 132062400 kB
Committed_AS: 76627724 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1117304 kB
VmallocChunk: 34222330512 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 206416 kB
DirectMap2M: 5003264 kB
DirectMap1G: 263192576 kB
On 11 Mar 2013, at 18:41, Serge Hallyn wrote:
> Quoting Benoit Lourdelet (blourdel at juniper.net):
>> Hello,
>>
>> I am running LXC 0.8.0 kernel 3.7.9 and try to start more than 1000 small containers : around 10MB of RAM per containers.
>>
>> Starting around the first 1600 happens smoothy - I have a 32 virtual core machine - but then everything gets very slow :
>>
>> up to a minute per contain creation. Ultimately the server CPU goes to 100%.
>>
>> I get this error multiple time in the syslog :
>>
>>
>> [ 2402.961711] INFO: task lxc-start:128486 blocked for more than 120 seconds.
>> [ 2402.961717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 2402.961724] lxc-start D ffffffff8180cc60 0 128486 1 0x00000000
>> [ 2402.961727] ffff883c30359cb0 0000000000000086 ffff883c2ea3c800 ffff883c2f550600
>> [ 2402.961734] ffff883c2d955c00 ffff883c30359fd8 ffff883c30359fd8 ffff883c30359fd8
>> [ 2402.961741] ffff881fd35e5c00 ffff883c2d955c00 ffff883c3533ec10 ffffffff81cac4e0
>> [ 2402.961747] Call Trace:
>> [ 2402.961753] [<ffffffff816dbfc9>] schedule+0x29/0x70
>> [ 2402.961758] [<ffffffff816dc27e>] schedule_preempt_disabled+0xe/0x10
>> [ 2402.961763] [<ffffffff816dadd7>] __mutex_lock_slowpath+0xd7/0x150
>> [ 2402.961768] [<ffffffff8158b911>] ? net_alloc_generic+0x21/0x30
>> [ 2402.961772] [<ffffffff816da9ea>] mutex_lock+0x2a/0x50
>> [ 2402.961777] [<ffffffff8158c044>] copy_net_ns+0x84/0x110
>> [ 2402.961782] [<ffffffff81081f4b>] create_new_namespaces+0xdb/0x180
>> [ 2402.961787] [<ffffffff8108210c>] copy_namespaces+0x8c/0xd0
>> [ 2402.961792] [<ffffffff81055ea0>] copy_process+0x970/0x1550
>> [ 2402.961796] [<ffffffff8119e542>] ? do_filp_open+0x42/0xa0
>> [ 2402.961801] [<ffffffff81056bc9>] do_fork+0xf9/0x340
>> [ 2402.961806] [<ffffffff81199de6>] ? final_putname+0x26/0x50
>> [ 2402.961811] [<ffffffff81199ff9>] ? putname+0x29/0x40
>> [ 2402.961816] [<ffffffff8101d498>] sys_clone+0x28/0x30
>> [ 2402.961819] [<ffffffff816e5c23>] stub_clone+0x13/0x20
>> [ 2402.961823] [<ffffffff816e5919>] ? system_call_fastpath+0x16/0x1b
>
> Interesting. It could of course be some funky cache or hash issue, but
> what does /proc/meminfo show? 10M ram per container may be true in
> userspace, but the network stacks etc are also taking up kernel memory.
>
> I assume the above trace is one container waiting on another to finish
> it's netns alloc. If you could get dmesg output from echo t >
> /proc/sysrq-trigger during one of these slow starts it could show where
> the other is hung.
>
> -serge
>
More information about the lxc-users
mailing list