Hello all,
we are migrating 4 hosts from 2.5.x to 3.0.1 build 35804.
The hosts are:
1 DELL 6850 4 Xeon 3.0 with 16GB RAM
1 IBM 365 4 Xeon 2.2 Ghz with 16GB RAM
1 IBM 445 8 Xeon 2.8 Ghz with 24GB RAM
1 IBM 460 8 Xeon 3.0 Ghz with 32 GB RAM
These hosts run about 45 guests mostly with RedHat Enterprise 3.0 Update 3,4,5,6.
They are used for Oracle DB and varios SAP instances. Some of them have 3.6 GB of Ram and 2vCPU (the maximum allowed by ESX 2.5).
All guests have been migrated by Cloning them as new VM to new LUNs, upgrading Virtual Hardware and then upgrading VMWare Tools.
Suddenly after the migration on some of them we have started to experience unexpceted OOM (Out of Memory) errors in the guest which caused the kernel to kill the involved process (tipically Oracle). Some of them also becomed totally unresponsive and forced us to reset them.
This sounds very strange since in over 18 months of production on ESX 2.5.x we never (and I can assure never) experienced this type of problem, even when the guest was really really loaded (expecially Oracle ones).
Searching in the forum I found this thread:
http://www.vmware.com/community/thread.jspa?threadID=65293
which helped us to mitigate the situation.
For the moment it seems that after issuing this command:
echo flattax > /proc/vmware/sched/mem
and after playing with reservation the situation is under control.
It seems that by default ESX will be fairly aggressive in swapping out the VM process and so when the guest tries to allocate the necessary memory it will fail even if the guest has the necessary RAM + Swap to handle this.
Is there anyone who is experiencing similar trouble?
Let me know if you need more info.
TIA.
Regards,
Luca