[Lxc-users] Help troubleshooting declining performance / high %SI CPU when running 9 Ubuntu 10.04 LXCs

Ivan Fetch ifetch at du.edu
Wed Oct 12 02:02:55 UTC 2011


Hi,

THe LXC containers share /var (13Tb) with the host OS. This is something I want to change, but need to wait until the containers are less used.

I will try some load testing without SSH, although even commands like top will use 11% of a CPU core. THanks for the suggestion!

- Ivan

On Oct 11, 2011, at 7:21 PM, Derek Simkowiak wrote:

> > Has anyone seen anything like this? 
> 
>     I have not seen the performance degradation you describe, however, SSH / SCP does have very poor network I/O performance, due to bad buffer sizes.  There's an I/O patch here:
> 
> http://www.psc.edu/networking/projects/hpn-ssh/
> 
>     They get massive I/O improvements with that patch.  So, it's possible the issue is SSH and nothing to do with LXC.  
> 
>     I suggest using iperf (network I/O tester) to see if you can reproduce the symptoms without SSH.  Also run some big dd commands inside and outside of the LXC container, for comparison.  In short, trash the disk and network without using SSH and see if you can find a reproducible test case.
> 
>     Also, does the LXC container have its own partition?  Or does it share the filesystem with the host O.S.?
> 
> 
> Thanks,
> Derek Simkowiak
> http://derek.simkowiak.net
> 
> 
> P.S.> (At this moment I'm getting a 403 error from the HPN-SSH link... but it was working a few days ago.)
> 
> On 10/11/2011 05:17 PM, Ivan Fetch wrote:
>> Hello,
>> 
>> We've looked more at this system as performance begins to degrade:
>> 
>> DUring an scp of a file to one of the lxc containers, iostat shows "await" numbers for individual disks hitting 80, but not for sustained amounts of time. Using ps and top to look at cpu, the scp process is using 70% of two CPU cores, and %SI in top fluctuates between 13 and 30%. OTher processes begin to use more CPU than they normally would, like top, ps, sshd, Etc. For memory, 27G out of 32G is being used to cache IO, but this seems like a good thing?
>> 
>> If I reboot this box, it will perform better, but it will continue to degrade for 7-15 days until %SI CPU is sustained at 40-60%, and performance is slow enough that shutting down the lxc containers takes 20 minutes per container.
>> 
>> Has anyone seen anything like this?
>> 
>> 
>> Thanks,
>> 
>> Ivan.
>> 
>> On Sep 15, 2011, at 9:42 AM, Iliyan Stoyanov wrote:
>> 
>> 
>>> Hi Ivan,
>>> 
>>> you should probably do a monitoring with iostat and vmstat also. On the top of my head I can think of at least 3 or 4 reasons why this might be happening. I have similar problems with a simple laptop machine without LXC containers on it (and don't have such on a server with a bunch of containers on it). In my experience with bad SI everything always come back to be RAM related. Also check your filesystem performance. Most of the FSes nowadays keep a ton of the journalling info in RAM. I know my response is not exactly an answer to your specific question but I hope it might give you some pointers for better monitoring of the situation.
>>> 
>>> BR,
>>> 
>>> --ilf
>>> 
>>> On Thu, 2011-09-15 at 09:12 -0600, Ivan Fetch wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I've inherited a Sun 4540 (thumper) machine running 9 LXC containers. During the past few weeks we've been troubleshooting a decline in performance, which ends up in high %SI (software interrupt) CPU usage. I'm hoping someone here can help troubleshoot and narrow down what the real issue is - this one really has me stumped.
>>>> 
>>>> THis box has 48 disks, 5 RAID6 which are in a RAID0, using md. Two NICs are bonded together, and a bridge is used for the box's IP, and the LXC network interfaces.
>>>> 
>>>> Linux is Ubuntu 10.04, LXC 0.6.3 , containers are also 10.04. Containers run Apache, some custom image processing, gaussian, and FTP server...
>>>> 
>>>> The box performs well after a reboot, with all containers back online. After ~5 days, we notice that the box is sluggish, and backup jobs (Netbackup) get less than 1Mb/sec over the network. CPU eventually reaches 61% SI. OTher processes (I am looking at ps -ax -o pcpu ..... |sort -n) begin taking much higher percent CPU than they should need, I imagine because the high %SI is taking cycles; E.G. I'll briefly see ps or sort or a shell using 6% CPU. Top shows %sy between 5-20, %wa under 5.
>>>> Memory (32Gb) is mostly used for cache, and there is no swapping.
>>>> 
>>>> I know next-to-nothing about tracking down the cause for high %SI CPU usage.
>>>> 
>>>> 
>>>> Thanks for any help looking at this with a clear head,
>>>> 
>>>> - Ivan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> .
>>>> ------------------------------------------------------------------------------
>>>> Doing More with Less: The Next Generation Virtual Desktop 
>>>> What are the key obstacles that have prevented many mid-market businesses
>>>> from deploying virtual desktops?   How do next-generation virtual desktops
>>>> provide companies an easier-to-deploy, easier-to-manage and more affordable
>>>> virtual desktop model.
>>>> 
>>>> http://www.accelacomm.com/jaw/sfnl/114/51426474/
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Lxc-users mailing list
>>>> 
>>>> 
>>>> Lxc-users at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/lxc-users
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> .
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> 
>> http://p.sf.net/sfu/splunk-d2d-oct
>> 
>> _______________________________________________
>> Lxc-users mailing list
>> 
>> Lxc-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/lxc-users
> 
> <ATT00001..txt><ATT00002..txt>























.



More information about the lxc-users mailing list