Quantcast
Channel: VMware Communities : Popular Discussions - VI: VMware ESX® 3.0
Viewing all articles
Browse latest Browse all 60069

LUN Performance Problem and HBA to fabric design query

$
0
0

Hi All,

 

We're having a few issues with our VMware ESX Cluster (3.0.1 all recent patches).

As part of the analysis I came across the following document.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=1301

 

The bit I'm intrigued by is where it advises connecting each HBA1 to the first fabric switch and each HBA2 to the second fabric switch.

I was unaware of this document until recently.

Are there any technical reasons for this?

 

We're running HP Proliants, each with two QLogic HBA's each of which connects to a separate fabric switch. This then connects, as per the document, to an EVA8000 and provides 4 access paths.

But in our environment not all of the first HBA's connect to the same fabric switch. As long as zoning etc. is set up and LUN presentation on the EVA is done then it seems to work okay.

 

The problem I'm getting is that on one particular LUN that is used for production servers we're seeing problems where performance of all servers on this LUN suddenly  declines to a point where they become almost completely unresponsive. We're seeing a number of SCSI Timeouts, aborts and conflicts reported in the vmkernal log. The trigger seems to be when we initiate some kind of write to this LUN. Almost like the vmfs locking is being done, but without any obvious candidates. We don't use snapshots. And we were not creating any virtual servers, the two most obvious candidates for vmfs locking.

The LUN is 500GB with about 180GB free.

Deploying a VM from a template to this LUN also causes the issue to occur. We've never actually completed a deployment as the effect was too severe and it was taking considerably longer than normal. Using a newly created LUN we deployed from a template in 20 minutes, whereas to the problematic one we were 20% complete after 10 minutes.

As an additional note this LUN is also replicated using HP Continuous Access. It's not the only LUN we replicate and I've not seen issues on the other production LUNs used.

 

I am planning on changing the SAN structure so that all HBA1's are on the same switch. But I'm not really convinced that would change anything. From what I've read the main issue is path thrashing but I shouldn't see that on an Active/Active EVA8000 with Fixed as the multi-path option.

 

Any input or thoughts welcome. We do have a call logged, but I thought I'd see if there were any ideas out there.


Viewing all articles
Browse latest Browse all 60069

Trending Articles