Recently we had a discussion about swapping, as Duncan mentioned in his article “Swapping” Swapped memory might not have impact on the performance of the virtual machine.
There are scenarios when pages can be swapped out without experiencing performance problems. One common scenario is a bootstorm, i.e startup of many virtual machines at once. Bootstorms can happen when a host failure occurs and High Availability powers on the virtual machines on other host, but are also frequently encountered in windows shops after Patch Tuesday, when the operations team need to obey a limited maintenance window timeslot.
When a virtual machine guest OS starts, there will be a period of time before the VMware tools is loaded and the vmmemctl (balloon driver) is operational. During this timeslot the operating system can access a large portion of its configured memory. Windows systems are notorious for this as they tend to touch every page until it reaches the end of their configured memory. Unfortunately page sharing due to Transparent Page Sharing (TPS) is also at a minimum. Redundant memory pages are not collapsed immediately when a virtual machine is started. TPS is a VMkernel background process and uses a cycle of 60 minutes (Mem.ShareScanTime) to scan a virtual machine for page sharing opportunities.
During these bootstorms many virtual machines are powered on at the same time, all claiming lots of memory or even their maximum configured memory (windows). This behavior leads to a spike in memory usage and without the help of the balloon driver and TPS, the ESX host needs to resort to swapping out memory.
When referring to windows startup, windows will touch every page and this forces ESX to back all in machine memory (physical memory). These pages are filled with useless information and chances are that this might never be accessed by the virtual machine again. Now ESX will not proactively swap memory back in to physical memory when the memory pressure disappears. These pages will remain swapped until it is accessed by the virtual machine, at that point ESX will swap it into memory.
Swapping during the bootstorm will delay the boot process, but these swapped out pages will not cause any performance problems during normal operation.
As mentioned in Duncan’s “Swapping article“, there are a few metrics that indicates that a virtual machine is swapping or has swapped before. When encountering swapped memory, check the metrics SWCUR (Swap current) and SWTGT (Swap target). If a bootstorm occurred it is likely to have a higher value at SWCUR than at the SWTGT.
The SWTGT indicates the desired amount of memory to be swapped out, this is determined by ESX by the resource entitlement calculation of the virtual machine. If there is no memory pressure, the swaptarget will be equal to 0, but because pages remain in the swap file until accessed, the SWCUR will indicate the remaining swapped out pages.
If memory contention does occur, ESX will attempt to make the SWCUR equal to the SWTGT (swap target).