Memory reclamation, when and how?

After discussing with Duncan the performance problem presented by @heiner_hardt , we discussed the exact moment the VMkernel decides which reclamation technique it will use and specific behaviors of the reclamation techniques. This article supplements Duncan’s article on Yellow-bricks.com.

Now let’s begin with when the kernel decides to reclaim memory and see how the kernel reclaims memory. So host physical memory is reclaimed based on four “free memory states”, each with a corresponding threshold. Based on the Threshold, the VMkernel chooses which reclamation technique it will use to reclaim memory from virtual machines.

Free Memory state Threshold Reclamation technique
High 6% None
Soft 4% Ballooning
Hard 2% Ballooning and Swapping
Low 1% Swapping

The high memory state has a threshold hold of 6%, that means that 6% of the ESX host physical memory minus the service console memory must be free. When the virtual machines use less than 94% of the host physical memory, the VMkernel will not reclaim memory because there is no need to, but when the memory usage starts to fall towards the free memory threshold the VMkernel will try to balloon memory. The VMkernel selects the virtual machines with the largest amounts of idle memory (detected by the idle memory tax process) and will ask the virtual machine to select it’s idle memory pages. Now to do this the guest os needs to swap those pages, so if the guest is not configured with sufficient swap space, ballooning can become problematic. Linux behaves pretty worse in this situation, invoking OOM (out-of memory) killer when its swap space is full and starts to randomly kill processes.

Back to the VMkernel, in the High and Soft state, ballooning if favored over swapping. If it ESX server cannot reclaim memory by ballooning in time before it reaches the Hard state, the ESX turns to swapping. Swapping has proven to be a sure thing within a limited amount of time. Opposite of the balloon driver, which tries to understand the needs of the virtual machine let the guest decides whether and what to swap, the swap mechanism just brutally picks pages at random from the virtual machine, this impacts the performance of the virtual machine but will help the VMkernel to survive.

Now the fun thing is, before the VMkernel detects the free memory is reaching the soft threshold, it will start to request pages through the balloon driver (vmmemctl), this is because it takes time for the Guest OS to respond to the vmmemctl driver with suitable pages. By starting prematurely, the VMkernel tries to avoid the situation that it will reach the Soft state or worse. So you can see ballooning occurring sometimes before the Soft state is reached. (between 6 and 4% free memory)

One exception is the virtual machine memory limit, if a limit is set on the virtual machine, the VMkernel always tries to balloon or swap pages of the virtual machine after reaching its limit, even if the ESX host has enough free memory available.

Comments

  1. Anon says

    Thanks good info. It would be also helpful to add what hostmem % TPS starts as I understand that is different with the 5500 CPU’s.

  2. says

    Hi, Can you explain your question a bit more, because TPS is a process what continues cycles to investigate which memory pages it can collapse. It is not related to any memory statistic whatsoever.

  3. says

    Another great aricle Frank, good information to know.

    One quick question regarding the following statement.

    “One exception is the virtual machine memory limit, if a limit is set on the virtual machine, the VMkernel always tries to balloon or swap pages of the virtual machine after reaching its limit”

    Would this apply at resource group level as well, i.e a resource groups reaches it’s limit, does it starts triggering ballooning within virtual machines that are part of that resource group?

  4. YP Chien says

    Great info on memory on memory reclamation.
    In your article, u mentioned:
    “When the virtual machines use less than 94% of the host physical memory, the VMkernel will not reclaim memory because there is no need to, but when the memory usage starts to fall towards the free memory threshold the VMkernel will try to balloon memory.”

    Based on some of our own tests on memory reclamation of ESX we conducted recently, we did find some interesting facts as opposed to the above that we would like to share.

    In our tests, we have around 60% of memory overcommit and still the ESX is in high memory state with abundant of free memory left thanks to TPS. All test VMs are standard configured Win2003 servers with vmware-tools installed. We also developed our own memory workload simulator so that we could increase memory load at a pre-defined rate and drive the free memory down toward the memory state threshold of 6%.
    As the free memory started to drop but way before the 6% threshold, we found the ESX started memory ballooning while it is still in high memory state. As free memory continued to drop, the ESX started the swapping even it was still in high memory state! If anyone interested in our findings (with esxtop screenshots + timing diagrams, etc.) , we would be more than happy to share our memory test lab document.

  5. Jtee says

    Very insightful Frank thanks! Do you know how the kernel controls the release of balloon memory after ballooning has occured? Im seeing hosts with around 10% consumed/total GB, and 1-3GB of Ballooning across VM’s (a few days after a memory crunch on the host).

    It seems that the balloon drivers stay inflated and settle very slowly after hitting the high/soft limits on the host.

  6. says

    Hi,

    Thats very interesting, did you record or discover the rate of decrease?
    And at what point did the esx host start to balloon? I would love to see that document!

  7. says

    Hi!
    Thanks for the compliment.
    Very good question, i’m currently preparing an article about ballooning, so i might include this in the article.
    I will keep you posted.
    Is there still memory ballooned even when the ESX host is in high free memory state? The amount of free memory, is that near the 6% limit?

  8. YP Chien says

    1. We have developed a memory workload simulator that we could control the amount of memory load, its rate of increase, amount of memory activity (to simulate idle memory) and length of memory workload. In our test, the ESX started the balloon (on VM with the least amount of memory activities, as expected) when free memory dropped to around 6.4% in our case. The exact number may vary depending on how fast the rate of decrease. We could share with you the test setup, results from esxtop and perfmon log counters that show the timing of balloon and swapping activity. Just sent me an email to:
    yp_chien AT kingston.com

    2. As part of of our tests, we did find the memory balloon will not deflate even long after ESX was in high state with plenty of free memory. We found that the memory balloon will only deflate if a. after all swapped memory were backed out in physical memory (which may take long time) and b. there are memory activities or requests from the VM..

  9. says

    YP Chien,
    I’m very interested in how ESX recovers from balooning or swapping events and what the options are. Did you ever try VMotion-ing the ballooned or swapped VMs from overcomitted resource pool to resource pool with ample resources? If so did balloon driver stay inflated and swapped memory stay swapped? My guess is “yes”.

    I’m trying to understand to fully understand the cons of recoverying from ballooning and swapping situations so that we can create the right policy on overcomittment of resources in a test environment. It would be ideal if the balloon driver could pull memory back in as memory pressure is reduced.

  10. YP Chien says

    Hi Sean:

    You mentioned:
    “Did you ever try VMotion-ing the ballooned or swapped VMs from overcomitted resource pool to resource pool with ample resources? If so did balloon driver stay inflated and swapped memory stay swapped? My guess is “yes”.”

    Our tests were not done “deliberately” on ESX cluster with DRS so we could force memory balloon and swapping activities. With DRS, the VMs will be vmotioned to another ESX and I did find out that:
    - Balloon memory will be gone
    - swapped memory stay swapped and will be backed out to physical memory eventually
    So to answer your questions, do take advantage of DRS to balance the workload. Memory balloon will only be deflated if:
    1. all swapped memory has been backed out into physical memory
    2. there are memory requests generated by the affected VM

  11. ML says

    Hello,
    I read the KB 1021896. It says :
    In addition, if memory becomes over-committed on the system, the vmkernel and vmm can break large pages as necessary.
    It could be interresting to consider this in this blog.

    Do someone know if ESX use ballooning and page break simultaneously. Or try to break the pages, then use ballooning.

    Regards

  12. PPM says

    Hi,

    we are currently considering virtualising some pretty significant SQL workloads. While the vmware best practices documents for SQL server inside vmware recommend turning on ballooning, a colleague who attended a deep dive with a SQL Microsoft MVP came back and the SQL guy strongly suggested that ballooning should always be turned off for SQL workloads. We have 165 SQL instances, some of which will need 5-10000 IOPS so performance and memory management is critical. Do you guys have a view on this from experience ?.

    Thx,
    Paul

  13. jitesh says

    hi frank

    since esxi does not have a service console how does the 6% high memory state of the host work?
    As you wrote “6% of the ESX host physical memory minus the service console memory must be free”
    Does esxi use a different calculation?
    Thanks

    Jitesh

  14. Mahadzar says

    Just want to share, I have an issue when we were doing performance testing on our platform, vmmemctl went crazy and causing CPU high.
    It seems likely that the main cause was the CPU wait caused by the overhead of multiple cores that weren’t actually in use, with vmmemctl being a symptom of this. One of recommendation would be to reduce this device to a single CPU or at most 2 if processes running on the VM are capable of hyperthreading
    But i’m not sure the recommendation is valid or not.

  15. Kris says

    Hi,
    I had similar issue with one of the VMs but i’m surprising why DRS did not take any action instead ballooning
    the memory.

    any comments?
    Br,
    Kris

  16. says

    Frank, thanks for this article. Found it while trying to look up the % values after reading the new Transparent Page Sharing (WP-2013-01E) by Banerjee, Moltmann, Tati and Venkatasubramanian.
    Do you know if these values of 6%, 4%, 2% and 1% are still the default values for ESXi 5.5 ?
    Thanks