We had a bad ESX Host HBA flake out and start flooding the fiber with errors. Over time this started to negatively affect all Hosts and associated VMs sharing the Datastores this Host was accessing. The result was delayed disk access requests throughout the cluster.
From a Host perspective there were various anomolies in Virtual Center pointing to an issue.
From a Windows guest perspective, we started seeing disk(11) and symmpi(15) errors in the system event log (attached).
We alert for various things throughout the environment however; this particular event started at night and wasn't noticed until morning.
Questions:
What are people using to alert them of excess SCSI reservation conflicts. Is there a commonly used tool that can monitor ESX logs for errors?
If I can't do that, I've considered alerting for the 11 and 15 system event errors on the guest.
Lastly, events don't apper to be generated regularly on the guest in the event of a complete disconnect from SAN. I'm sure there are anomalies but, I'm looking for something consistent and relatively immediate. I've been experimenting but, haven't found anything on the guest that can be used in this scenerio so, I'm also wanting to find a way to alert for this.