Recently we had a discussion about swapping, as Duncan mentioned in his article “Swapping” Swapped memory might not have impact on the performance of the virtual machine.
There are scenarios when pages can be swapped out without experiencing performance problems. One common scenario is a bootstorm, i.e startup of many virtual machines at once. Bootstorms can happen when a host failure occurs and High Availability powers on the virtual machines on other host, but are also frequently encountered in windows shops after Patch Tuesday, when the operations team need to obey a limited maintenance window timeslot.
When a virtual machine guest OS starts, there will be a period of time before the VMware tools is loaded and the vmmemctl (balloon driver) is operational. During this timeslot the operating system can access a large portion of its configured memory. Windows systems are notorious for this as they tend to touch every page until it reaches the end of their configured memory. Unfortunately page sharing due to Transparent Page Sharing (TPS) is also at a minimum. Redundant memory pages are not collapsed immediately when a virtual machine is started. TPS is a VMkernel background process and uses a cycle of 60 minutes (Mem.ShareScanTime) to scan a virtual machine for page sharing opportunities.
During these bootstorms many virtual machines are powered on at the same time, all claiming lots of memory or even their maximum configured memory (windows). This behavior leads to a spike in memory usage and without the help of the balloon driver and TPS, the ESX host needs to resort to swapping out memory.
When referring to windows startup, windows will touch every page and this forces ESX to back all in machine memory (physical memory). These pages are filled with useless information and chances are that this might never be accessed by the virtual machine again. Now ESX will not proactively swap memory back in to physical memory when the memory pressure disappears. These pages will remain swapped until it is accessed by the virtual machine, at that point ESX will swap it into memory.
Swapping during the bootstorm will delay the boot process, but these swapped out pages will not cause any performance problems during normal operation.
As mentioned in Duncan’s “Swapping article“, there are a few metrics that indicates that a virtual machine is swapping or has swapped before. When encountering swapped memory, check the metrics SWCUR (Swap current) and SWTGT (Swap target). If a bootstorm occurred it is likely to have a higher value at SWCUR than at the SWTGT.
The SWTGT indicates the desired amount of memory to be swapped out, this is determined by ESX by the resource entitlement calculation of the virtual machine. If there is no memory pressure, the swaptarget will be equal to 0, but because pages remain in the swap file until accessed, the SWCUR will indicate the remaining swapped out pages.
If memory contention does occur, ESX will attempt to make the SWCUR equal to the SWTGT (swap target).
Re: Swapping
1 min read
Good discussion of the bootstorm problem.
Bootstorms + oversized vRAM = bogus memory pressure => lots of swapped “data” that is never needed and doesn’t hurt performance.
Most large orgs have VMs with more vRAM than they need. The organizational process of getting vRAM resized is tedious (customer, file this form and the IT department will review and respond within x days, you need managment approval, etc.). So customers err on the side of too much. Or, maybe only a few “models” of VM are even offered (1 GB, 2 GB, 4 GB, etc.). The result: many VMs with excess vRAM. In this case, the amount of stuff sitting in swap files (and not being swapped back in) just illustrates how ESX is efficiently using scarce pRAM even though the VMs aren’t properly sized.
Keep your eyes open for a possible future feature: transparent page sharing that immediately shares zero-pages as they’re written. I.e. one pRAM page holds all-zeros. As a vRAM page is accessed and all-zeros are written to it, the hypervisor immediately backs this vRAM page with the existing all-zeros pRAM page. That should drastically reduce the bogus memory pressure due to oversized-vRAM VM bootstorms.
Hi Craig,
Yeah the new feature is awesome, can’t wait to see that one released. Besides reducing the bogus memory pressure, it will also benefit TPS as it can scan more “real” pages during its cycle.
Frank, do you know if it is possible to force free up the swapped page to “clear” stats ?
NiTRo, good question.
It is something I’m currently looking in to.
Not only for the reason to clear the stats, but also for machines who use a local swap file.
Many NFS shops store the swap file on the local storage of the ESX.
If a virtual machine is vmotioned, the content of the local swap file is copied to the destination ESX host.
By copying useless data across the source and destination host is something I want to avoid, it generates unnecessary overhead on the vmotion process.
Frank, this is exactly what i’m doing because i don’t use reservation.