Quantcast
Channel: VMware Communities : Popular Discussions - VI: VMware ESX® 3.0
Viewing all articles
Browse latest Browse all 60069

some problem with storage (i think)

$
0
0

 

Hello,

 

 

I have such problem (very serious i think). From beginning when I bought all hardware for ESX 3.0.2 (2xHP DL360G5 + HP MSA1000 (with 8x146GB U320SCSI 15k) everything was almost ok. But in may I needed to extend storage to create new VMs on this infrastructure.

 

 

That is why i bought 4 disks 300GB U320 SCSI 15k, and I added it physically to MSA1000 (I didn't power off anything). Next step was (with powered on MSA) was to start MSA Tools from CD, to create new array (raid5) that is built from these 4 disks. It was successful, so in VI Client I rescaned both HBA controllers, and then I could see new storage group available, so I created new VMFS on this (900GB) and started putting there VMs (some new, some moved from the first, where space was little left).

 

 

And I thought it is fine, until the first problem, where it needs some describing: Some day I was working with something, and suddenly I noticed, that all servers are unavailable (when i wanted to switch to console view of one VM, i got message, that configuration file for this VM was not found (or something)).

 

 

I next started putty and connected to host where this vm resides, and when i went to: /vmfs/volumes/STORAGENAME i noticed strange thing, that when I run "ls" command, i had to wait about 20-30 seconds before I got any answer. It was on both ESX hosts. After about hour, problem stopped by itself, and from this day (about 3 weeks now) it didn't show.

 

 

Of course I created incident in VMWare, and after analyzing logs by VMWare Engineers, I got such response (I hope, someone will be able to help me solving this problem):

 

 

-


 

 

Following our conversation here is the analysis of the scsi reservations:

 

 

> SCSI reservation conflict < failure messages (including number of occurances)

 

 

1 (Feb 4) - vmhba1:0:1

2 (Jan 23) - vmhba1:0:1

1 (Mar 31) - vmhba1:0:1

2 (May 12) - vmhba1:0:1

66 (May 21) - vmhba1:0:1

191 (May 22) - vmhba1:0:1

394 (May 23) - vmhba1:0:1

156 (May 25) - vmhba1:0:1

55 (May 26) - vmhba1:0:1

54 (May 28) - vmhba1:0:1

 

 

Associated Devices/Volumes:

 

 

vmhba1:0:1 /dev/sda

 

 

> Failed to reserve Volume < failure messages (including number of occurances)

 

 

---

 

 

Associated Devices/Volumes:

> Affected Disk as per FDISK <

 

 

-


 

 

VMHBA: vmhba1:0:1 DISK: /dev/sda

 

 

Disk /dev/sda: 880.8 GB, 880890798080 bytes

/dev/sda1 1 107095 860240523+ fb Unknown

 

 

As you can see it is all the time with the same volume and since May when we can see this event to happen.

 

 

We recommend to check the hardware since you mande some changes on the san side at the beginning of May and since then

 

 

you are finding this issue.

 

 

I hope this information is enough to solve your query. If you need more information please don't hesitate in opening a

 

 

new case with us via phone or email.

 

 

-


 

 

After this I also noticed one thing. I run VCB backup at 2:00 AM by running normal bat file with vcbmounter command.

 

 

And one day when it started (i noticed this the next day morning, after i saw this vm is POWERED OFF) i saw in eventlog

 

 

entries:

 

 

EventId 15: symmpi :The device, \Device\Scsi\symmpi1, is not ready for access yet.

EventId 11: Disk :The driver detected a controller error on \Device\Harddisk0.

 

 

 

Now I see that these entries also on this particular VM, but it did not shutdown itself (yesterday entries in logfile are from 0:43, so before VCB scripts).

 

 

Does anyone have any idea what to check, how to make it works? I read, that some people also have similiar problems on HP MSA1000, but it is on vmware compatibility list, so there should be no problems with it.

Or maybe I should restart MSA in free time to fix it, or maybe some other actions should I take?

I would be very grateful for any ideas regarding this matter, because It is very critical for our company (30 production vm's running, so 1hour stop causes serious money loss )

 

 

 

 

 


Viewing all articles
Browse latest Browse all 60069

Trending Articles