frankdenneman Frank Denneman is the Chief Technologist for AI at VMware by Broadcom. He is an author of the vSphere host and clustering deep dive series, as well as a podcast host for the Unexplored Territory podcast. You can follow him on Twitter @frankdenneman

Ballooning, Queue Depths and other back pressure features revisited

4 min read

Recently I’ve been involved in a couple conversations about ballooning, QoS and queue depths. Remarks like ballooning is bad, increase the queue depths and use QoS are just the sound bits that spark the conversation. What I learn from these conversations is that it seems we might have lost track of the original intention of these features.
Hypervisor resource management
Features such as ballooning and queue depths are invented to solve the gap between resource demand and resource availability. When a system experiences a state where resource demand exceeds the resources it controls the system has a problem. This is especially true in systems such as a hypervisor where you cannot control the demand directly. A guest operating system or an application in a virtual machine can demand a lot of resources. The resource management schedulers inside the hypervisor are tasked to fulfilling the demand of that particular machine while at the same time satisfy the resource demand of other virtual machines. Typically the guest OS resource schedulers are not integrated with the hypervisor resource schedulers, this can lead to a situation in which the administrator typically resorts to taking draconian measures. Measures such as disabling ballooning, increasing the queue-depth to become the digital equivalent of the Mariana trench.
Sometimes it is taken for granted, but resource management inside the hypervisor is actually a though challenge to solve. Much research is done on solving this problem; a lot of research papers trying to find an answer to this challenge are published on a monthly basis. Let’s step back and take a look from an engineer perspective (or should I say developer?) and see what the problem is and how to solve it in the most elegant way. It’s an architect job to understand that this functionality is not a replacement for a proper design. Let’s start by understanding the problem.
Load-shedding or back pressure
When dealing with the situation where resource demand exceeds resource availability you can do two things. Well if you don’t do anything, it’s likely to encounter a system failure that can affect a lot more than only that particular resource or virtual machines. Overall you don’t design for system failure, you want to avoid it and to do so you can either drop the load or apply some form of back pressure. I think we all agree that dropping load, sometimes referred to as load-shedding is not the most elegant way of dealing with temporary overload, that’s why a lot of effort is going into back pressure features.
A back pressure feature that everyone is familiar with is the memory balloon driver. Guest OS memory schedulers deal with used and free memory in such a way that this is transparent to the hypervisor. When the hypervisor is running out of physical machine memory it needs to figure out a way to retrieve memory. By using the balloon driver, the hypervisor asks the guest OS memory scheduler to provide a list of pages that it doesn’t use or doesn’t deem as important. After getting the info, the hypervisor proceeds to free up the physical memory pages to be able to satisfy incoming memory requests. Instead of dropping the new incoming workload it applies a back pressure system in the most elegant way. I don’t know why people are still talking about ballooning as bad. The feature is awesome, it’s the architect / sys admin job to come up with a plan to avoid back pressure in the system. Again the back pressure feature is not substitute for proper design and management.
But the most misunderstood back pressure feature could be queue-depths. Sometimes I hear people refer to queue depths as a performance enhancement. And this is not true. It just allows you to temporarily deal with I/O overload. The best way to have a clear understanding of queue depths is to use the bathroom sink analogy.
The drain is the equivalent of the data path leading to the storage array, the sink itself is the queue sitting on top of the data path / drain. The faucet represents the virtual machine workloads. Typically you open up the faucet to a level that allows the drain to cope with the flow of water. Same applies to virtual machine workloads and the underlying storage system. You run an x amount of workload that is suitable for your storage system. The moment you open up the faucet more your sink will fill up and at one point your sink will overflow. Thus you have to do some back pressure mechanism. In the bathroom sink world this typically is done by flowing the water back into the second sink. In the I/O scheduler world this typically resolves in a queue full statement. This typically bogs down the performance of the virtual machine so much that many admins/architect resolve by increasing the queue depth. Because this allows them to avoid the queue full state (temporarily) But in essence you just replaced your bathroom sink by a bigger sink, or something when people go overboard the increase the queue depth to the digital equivalent of a full size bathtub. This bathtub impacts a lot of other workloads as many workload now end up at the top of the queue instead of the deeper part, waiting their turn to go through the drain to the storage system. Result: latency increases in all applications due to improper designed systems. And remember when the bathtub overflows you typically have a bigger mess to deal with.
Back pressure features are not a substitute for proper design, therefor think about implementing a bigger drain of even better multiple drains. More bandwidth or just more data paths to the same storage system lead to a short delay of seeing the same back pressure problem again, it just occurs on a different level. Typically when a storage controller fills up its cache, it sends a queue full to all the connected systems, so the problem has now evolved from a system wide problem to a cluster wide problem. This is one of the big reasons why scale out storage systems are a great fit in the virtual datacenter. You create drains to different sewer system typically in a more plan-able manner. If you are looking for more information about this topic, I published a short series on the challenge of traditional storage architectures in virtual datacenters.
Quality of Service faces the same predicament. QoS is a great feature dealing with temporary overload, but again, it is not a substitute for a proper design. Back pressure features are great, it allows the system to deal with resource contentions while avoiding system failures. These features are unmissable in the dynamic virtual datacenter of today. When detecting that these features are activated on a frequent basis, one must review the virtual datacenter architecture, the current workloads and future workloads.
I think overall it all boils down to understand the workload in your system and have an accurate view of the capabilities of your systems. Proper monitoring and analytics tools are therefore indispensable. Not only for daily operations but also for architects dealing with maintaining a proper service level for their current workloads while architecting an environment that can deal with unknown future workloads.

frankdenneman Frank Denneman is the Chief Technologist for AI at VMware by Broadcom. He is an author of the vSphere host and clustering deep dive series, as well as a podcast host for the Unexplored Territory podcast. You can follow him on Twitter @frankdenneman

3 Replies to “Ballooning, Queue Depths and other back pressure features revisited”

  1. Frank
    Great Article by the way!
    I just wanted to say that I might know the reason why so many clients believe that ballooning is bad…
    Basically the problem is related with the simplistic knowledge they have and the initial a simplistic information it can be found about this topic.
    The following is what I wrote for the VMware community a couple of weeks ago:
    ————-
    I would like to point out that the definition for ballooning is not accurate.
    Below is the definition that can be found in the Module 8, slide 9. (vSphere ICM Course material)
    “Ballooning mechanism, active when memory is scarce, forces virtual machines to use their own paging areas.”
    The problem with this definition is quite simple: A complex process such as Ballooning cannot be summarize in just one line. In this case over simplifying leads to make a wrong statement.
    Technically speaking, ballooning can be used to reclaim memory from the guest OS and that process not necessarily can force the OS to use their own paging area.
    If the current definition were true, every time ballooning takes place would incur in some sort of performance degradation and that is not correct.
    Operating systems as well as the ESXi try to use the paging mechanism as a last resource.
    Common operating systems such as Windows and Linux, perform many operations before staring to page out memory pages from its own address space to disk.
    Some examples of those operations are:
    Using cache memory (not owned by processes)
    Working Set trimming.
    If these two common operations are exhausted and the operating system is still under memory pressure, then the memory manager will start paging memory to disk. (I’m simplifying the explanation)
    In order to prevent students to get confused without providing an over simplified definition I strongly suggest that the Ballooning definition should be change to something like this: (just a suggestion)
    “Ballooning is a mechanism triggered by the ESXi, in situations of memory contention, to intelligently reclaim memory from virtual machines”
    Is up to the instructor whether he explain what “intelligently” means or not, but he should mention that this mechanism makes use of the intelligence provided by the memory manager of each virtual machine in which the ballooning process was triggered.
    Going a little bit deeper…
    Under this situation, each memory manager from each VM will assign the most appropriate set of pages to the balloon driver and depending of the current memory state of each guest OS, memory pages might come from:
    1) Free page area (if there is enough memory here, there is no need to page out anything, and the balloon driver requirement can be fulfilled)
    2) Cache Area (If the free page area run out of pages, then cache area will be used and that could be enough to accomplish the task)
    3) Repurpose of memory pages after a working set trimming. (if there is nothing left in the free and cache area, then the OS will have to take away pages from different processes (thanks to an aging mechanism to distinguish hot pages from cold pages). Since these are memory pages from different working sets, the OS needs to save those pages some place else before giving them all to the balloon driver and finally this could lead to paging (backing up processes memory) to disk.
    ———
    Back to your post and To sum up:
    Many clients believe that ballooning its bad because they think it means paging memory out to disk and that is an over simplification.

  2. As so well put in the article proper system design to begin with allows the use of back pressure relief tactics to occur only rarely as intended.
    A good detailed design specification requires two things: an engineer/architect who knows what they are doing and a solid set of functional requirements from which to derive it.
    A good functional requirements specification requires two things: that same engineer/architect who still knows what they are doing and a solid, thorough business requirements specification. It is not enough just to be very conversant in the technology; a good engineer must be able to work with the business to fully document accurate requirements (load profiles, growth rates, availability, reliability, DR/BC, budget, etc.) then fully document the derived functional requirements then the design specification derived from that.
    How many times have we all seen implementations that are just whatever hardware and software an arbitrary budget would allow? An environment where queues experience back pressure rarely and in small amounts as intended must first be both well understood and well documented to begin with. Both excessive use of back pressure techniques as well as their complete absence (or ever getting anywhere near them) are evidence of an IT group either unwilling or unable to do solid design work.

Comments are closed.