Summary: When I perform a fullvm vcb backup of anything but a small test vm, my vcb proxy server will lock up, requiring a hard reboot of the vcb proxy server. When it locks up, I can ping the server but I can no longer access the server from the console or remote desktop. After the reboot, I find that it only copied a portion of the vm. The amount of data that is copied will vary. The most it has ever copied was about 78G of a total 100G snapshot. I then have to go back and maually remove the snapshot that VCB created. This same vcb proxy server works great with file level VCB backups. I am using Backup Exec 11D and I have a daily backup job that backs up about 95G of file level information through VCB without any problems at all. I only have trouble with the fullvm backups. I have opened up tickets with Dell and Vmware over the past 2 months and am very frustrated. Surely someone else has experienced similar issues. Anyone?
Details:
ESX Servers: I have 2 Dell Poweredge 2950 quad core server with 16GB RAM running ESX 3.5
Virtual Center Server: Dell Poweredge 650 with 4G Ram running Windows XP and Virtual Center 2.5 pointing to a SQL 2000 Virtual Center database on another server
SAN: ISCSI SAN on a Dell Powervault PVMD3000 with an NX1950 SAN controller
VCB Proxy Server: Dell Poweredge 2650 dual core processor with 4G RAM running Windows 2003 R2 Standard edition. This server has a mirrored 35G boot drive and a RAID 5 Array with a 275G volume used to mount the VCB backups. It is running VCB 1.1 and Backup Exec 11D with the latest patches. This server has 4 network cards. Two of them are the built-in Broadcomm cards of which only one is used and the other is a dual gigabit Intel Pro card of which only one port is used for the SAN traffic.
Switches: Dell Powerconnect 5324. San traffic is on a dedicated switch.
Other information: I have set up another server that is very similarly configured as my production server but does not have as much room as I need and I have been able to complete a fullvm backup of one of my larger vms. This vm is a file and print server and has a 100G virtual disk and a 10G virtual disk. When I get a successful fullvm backup on this server, the size of the backup is abou 100G and takes about 1hr and 15 minutes to fully copy up from the SAN. However, on my production server which has the storage room that I need, I can never seem to get a fullvm backup of a larger vm before it locks up. When it locks up, it does not blue screen and there are no telltale entries in any event logs that give me a clue as to what has happened. This server used to be a SQL server and was repurposed recently and was totally reloaded into the current VCB proxy server. When it was a SQL server, we did have some unexplained lockups from time to time so it leads me to believe that I have some sort of hardware error on this server but it passes all diagnostics and Dell or VMware cannot seem to figure out what may be wrong. It functions very well as a Backupexec server. I have also tried replacing the memory and that did not help any. Dell has sent me a new hard drive to replace one of the RAID 5 drives and asked me to rebuild the RAID but that did not help any either. I have tried various different drivers, BIOS firmware, etc with no luck at all.
The only thing that has helped any is that I recently upgraded the ESX servers from 3.0.2 to 3.5, the Virtual Center from 2.0.2 to 2.5 and the VCB from 1.0 to 1.1. Prior to that upgrade, the VCB fullvm backup was only able to copy about 20G max information before locking up and now I can get about 78G copied before it locks up.
If anyone has any ideas about what may be happening, please let me know.
Thanks,
Rod