[Lxc-users] Help troubleshooting declining performance / high %SI CPU when running 9 Ubuntu 10.04 LXCs

Ivan Fetch ifetch at du.edu
Wed Oct 12 00:17:42 UTC 2011


Hello,

We've looked more at this system as performance begins to degrade:

DUring an scp of a file to one of the lxc containers, iostat shows "await" numbers for individual disks hitting 80, but not for sustained amounts of time. Using ps and top to look at cpu, the scp process is using 70% of two CPU cores, and %SI in top fluctuates between 13 and 30%. OTher processes begin to use more CPU than they normally would, like top, ps, sshd, Etc. For memory, 27G out of 32G is being used to cache IO, but this seems like a good thing?

If I reboot this box, it will perform better, but it will continue to degrade for 7-15 days until %SI CPU is sustained at 40-60%, and performance is slow enough that shutting down the lxc containers takes 20 minutes per container.

Has anyone seen anything like this?


Thanks,

Ivan.

On Sep 15, 2011, at 9:42 AM, Iliyan Stoyanov wrote:

> Hi Ivan,
> 
> you should probably do a monitoring with iostat and vmstat also. On the top of my head I can think of at least 3 or 4 reasons why this might be happening. I have similar problems with a simple laptop machine without LXC containers on it (and don't have such on a server with a bunch of containers on it). In my experience with bad SI everything always come back to be RAM related. Also check your filesystem performance. Most of the FSes nowadays keep a ton of the journalling info in RAM. I know my response is not exactly an answer to your specific question but I hope it might give you some pointers for better monitoring of the situation.
> 
> BR,
> 
> --ilf
> 
> On Thu, 2011-09-15 at 09:12 -0600, Ivan Fetch wrote:
>> Hello,
>> 
>> I've inherited a Sun 4540 (thumper) machine running 9 LXC containers. During the past few weeks we've been troubleshooting a decline in performance, which ends up in high %SI (software interrupt) CPU usage. I'm hoping someone here can help troubleshoot and narrow down what the real issue is - this one really has me stumped.
>> 
>> THis box has 48 disks, 5 RAID6 which are in a RAID0, using md. Two NICs are bonded together, and a bridge is used for the box's IP, and the LXC network interfaces.
>> 
>> Linux is Ubuntu 10.04, LXC 0.6.3 , containers are also 10.04. Containers run Apache, some custom image processing, gaussian, and FTP server...
>> 
>> The box performs well after a reboot, with all containers back online. After ~5 days, we notice that the box is sluggish, and backup jobs (Netbackup) get less than 1Mb/sec over the network. CPU eventually reaches 61% SI. OTher processes (I am looking at ps -ax -o pcpu ..... |sort -n) begin taking much higher percent CPU than they should need, I imagine because the high %SI is taking cycles; E.G. I'll briefly see ps or sort or a shell using 6% CPU. Top shows %sy between 5-20, %wa under 5.
>> Memory (32Gb) is mostly used for cache, and there is no swapping.
>> 
>> I know next-to-nothing about tracking down the cause for high %SI CPU usage.
>> 
>> 
>> Thanks for any help looking at this with a clear head,
>> 
>> - Ivan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> .
>> ------------------------------------------------------------------------------
>> Doing More with Less: The Next Generation Virtual Desktop 
>> What are the key obstacles that have prevented many mid-market businesses
>> from deploying virtual desktops?   How do next-generation virtual desktops
>> provide companies an easier-to-deploy, easier-to-manage and more affordable
>> virtual desktop model.
>> http://www.accelacomm.com/jaw/sfnl/114/51426474/
>> 
>> _______________________________________________
>> Lxc-users mailing list
>> 
>> Lxc-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/lxc-users























.



More information about the lxc-users mailing list