I'm an advocate of rebuilding ESX servers rather than spending hours of troubleshooing time trying to find a needle in a hay stack. Depending on the response time you're getting from VMware, HP, or wherever you're getting your VMware support from, rebuilding starts to look attractive rather than spending hours on the phone or days of email log shipping back and forth with the support technicians. Of course, rebuilding would only be a logical decision if there were no underlying hardware issues that were the root cause of the host failure - an ESX host rebuild won't resolve hardware problems.
I have heard from many people who agree, supporting their opinion with claims that "I can rebuild an ESX server in X number of minutes". I've performed many deployments and I had a pretty good idea of what my deployment times per box were but I've never timed myself to put my own money where my mouth is. I decided to run through a build with a stopwatch. Following are my results.
Hardware:
HP Proliant DL580G2
4x P4 XEON 2.5GHz processors (512KB L2, 1MB L3 cache)
12GB RAM
Embedded Smart Array 5i disk controller (Read/Write cache present and enabled)
2x72GB Ultra320 10kRPM drives in a hardware RAID1 mirror
4x Broadcom Gb NICs
Software:
ESX 3.5.0 build 64607 (new and improved 2/20/08 edition)
The stopwatch begins running when the server begins booting the ESX 3.5.0 CD
0m: Boot from CD, select keyboard, mouse, manual disk partitioning, time zone selection, etc.
5m: File copy stage begins.
14m: Above file copy stage completed. Reboot.
17m: Server is booted up. Set date/time. Enable root SSH login. Manually create VMKernel vswitch using VIC. Run 166 lines of post install configuration scripts including installation of Proliant Essentials 7.9.1 agents.
27m: Scripted configuration stage above complete. Reboot
31m: Server is booted up. Remaining configuration done with VIC: Add host to Datacenter. Double check proper licensing. Configure bandwidth throttling on 2 portgroups. Increase virtual switch port count on VM switch. Configure swISCSI adapter and connect to ISCSI target. Add host to Vizioncore vRangerPRO. Move host from Datacenter into Cluster.
34m: VIC configuration stage above complete. Reboot.
37m: Server is booted up and ready minus patching.
At the 37 minute mark, the server is up and ready for action, minus patching. There are currently 9 patches for ESX 3.5.0 which will take an additional amount of time to apply using VMware Update Manager.
37m: VMware Update Manager remediation applied. Host enters maintenance mode and VMotions off one running VM
38m: Patching begins. 9 patches to install.
44m: 9 patches installed. Reboot.
47m: Server is booted up and thinks about the idle life for 8 minutes before exiting maintenance mode.
55m: Exit maintenance mode. Server is patched and ready for action.
Conclusion: On slower hardware with some scripted automation, ESX server can be completely rebuilt in under an hour. Efficiencies to decrease the total build time would be newer servers with faster bus speeds and processors, 15kRPM disk spindles, faster VirtualCenter server hardware, advancing the post install scripted configuration, selectively installing only need patches rather than all patches, using a kickstart.cfg script and/or automated deployment tool such as Altiris/HP RDP to complete the initial installation stage, capturing a host installation with patches and Proliant Essentials agents already installed, etc.
I'd be interested in hearing about your build times and automation methods.
[i]Jason Boche[/i]
[VMware Communities User Moderator|http://communities.vmware.com/docs/DOC-2444][/i]