After writing the article “(Alternative) VM swap file locations Q&A” I received a lot of questions about the destination of the swapped pages and reading back my article I didn’t do a good job clarifying that part of the process.
Which network is used for copying swapped pages?
As mentioned in the previous post the swap file itself is not copied over to the destination host, but only the swapped pages itself. Raphael Schitz (@hypervisor_fr) was the first to ask, which network is used to copy over the swapped pages? The answer is vMotion network.
The reason why the vMotion network is used, is that the source host running the active virtual machine, pulls the swapped pages back in to memory when migrating the memory pages to the destination host.
Are swapped pages on the source host swapped out on the destination host?
As the pages are copied out from the swap file to the destination host, swapped pages are copied into the stream of the in-memory pages from the source host to the destination host. That means that the destination host is not aware which pages orginate from swap file and which pages come from in-memory, they are just memory pages that need to be stored and made available to the new virtual machine.
To describe the behavior in a different way, the source host pulls the swapped pages from disk before sending them over, therefor the destination host sees a continues stream of memory pages, unmarked, all equal and are therefor stored in memory by the destination host.
What if the destination host is experiencing contention?
Well it’s up to the destination host to decide which pages to swap out to disk. During a vMotion process, the destination VM starts out with a clean slate, meaning that the memory target is not determined by the source host but by the destination host. Memory targets are local memory schedule metrics and thus not shared. The source host shares the percentage of active pages but it’s the destination hosts’ memory scheduler that determines the appropriate swap target for the new virtual machine. It can possibly push out memory pages back to its swap file as needed. The pages could be the same as the pages on the old host, but they can also completely different pages.
What about compressed pages?
For every rule there is an exception and the exception is compressed pages. During a vMotion process the destination host will maintain the compressed pages by keeping them compressed. This behavior occurs even with an unshared swap migration.
Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman
(Alternative) VM swap file locations Q&A
Lately I have received a couple of questions about Swap file placement. As I mentioned in the article “Storage DRS and alternative swap file locations”, it is possible to configure the hosts in the DRS cluster to place the virtual machine swapfiles on an alternative datastore. Here are the questions I received and my answer:
Question 1: Will placing a swap file on a local datastore increase my vMotion time?
Yes, as the destination ESXi host cannot connect to the local datastore, the file has to be placed on a datastore that is available for the new ESXi host running the incoming VM.Therefor the destination host creates a new swap file in its swap file destination. vMotion time will increase as a new file needs to be created on the local datastore of the destination host and swapped memory pages potentially need to be copied.
Question 2: Is the swap file an empty file during creation or is it zeroed out?
When a swap file is created an empty file equal to the size of the virtual machine memory configuration. This file is empty and does not contain any zeros.
Please note that if the virtual machine is configured with a reservation than the swap file will be an empty file with the size of (virtual machine memory configuration – VM memory reservation). For example, if a 4GB virtual machine is configured with a 1024MB memory reservation, the size of the swap file will be 3072MB.
Question 3: What happens with the swap file placed on a non-shared datastore during vMotion?
During vMotion, the destination host creates a new swap file in its swap file destination. If the source swap file contains swapped out pages, only those pages are copied over to the destination host.
Question 4: What happens if I have an inconsistent ESXi host configuration of local swap file locations in a DRS cluster?
When selecting the option “Datastore specified by host”, an alternative swap file location has to be configured on each host separately. If one host is not configured with an alternative location, then the swap file will be stored in the working directory of the virtual machine. When that virtual machine is moved to another host configured with an alternative swap file location, the contents of the swap file is copied over to the specified location, regardless of the fact that the destination host can connect to the swap file in the working directory.
Question 5: What happens if my specified alternative swap file location is full and I want to power-on a virtual machine?
If the alternative datastore does not have enough space, the VMkernel tries to store the VM swap file in the working directory of the virtual machine. You need to ensure enough free space is available in the working directory otherwise the VM not allowed to be powered up.
Question 6: Should I place my swap file on a replicated datastore?
Its recommended placing the swap file on a datastore that has replication disabled. Replication of files increases vMotion time. When moving the contents of a swap file into a replicated datastore, the swap file and its contents need to replicated to the replica datastore as well. If synchronous replication is used, each block/page copied from the source datastore to the destination datastore, it needs to wait until the destination datastore receives an acknowledgement from its replication partner datastore (the replica datastore).
Question 7: Should I place my swap file on a datastore with snapshots enabled?
To save storage space and design for the most efficient use of storage capacity, it is recommended not to place the swap files on a datastore with snapshot enabled. The VMkernel places pages in a swap file if it’s there is memory pressure, either by an overcommitted state or the virtual machine requires more memory than it’s configured memory limit. It only retrieves memory from the swap file if it requires that particular page. The VMkernel will not transfer all the pages out of the swap file if the memory pressure on the host is resolved. It keeps unused swapped out pages in the swap file, as transferring unused pages is nothing more than creating system overhead. This means that a swapped out page could stay there as long as possible until the virtual machine is powered-off. Having the possibility of snapshotting idle and unused pages on storage could reduce the pools capacity used for snapshotting useful data.
Question 8: Should I place my swap file on a datastore on a thin provisioned datastore (LUN)?
This is a tricky one and it all depends on the maturity of your management processes. As long as thin provisioned datastore is adequately monitored for utilization and free space and controls are in place that ensures sufficient free space is available to cope with bursts of memory use, than it could be a viable possibility.
The reason for the hesitation is the impact a thin provisioned datastores has on the continuity of the virtual machine.
Placement of swap files by VMkernel is done at the logical level. The VMkernel determines if the swap file can be placed on the datastore based on its file size. That means that it checks the free space of a datastore reported by the ESX host, not the storage array. However the datastore could exist in a heavily over-provisioned datapool.
Once the swap file is created the VMkernel assumes it can store pages in the entire swap file, see question 2 for swap file calculation. As the swap file is just an empty file until the VMkernel places a page in the swap file, the swap file itself takes up a little space on the thin disk datastore. Now this can go on for a long time and nothing will happen. But what if the total reservation consumed, memory overcommit-level and workload spikes on the ESXi host layer are not correlated with the available space in the thin provisioning storage pool? Understand how much space the datastore could possibly obtain and calculate the maximum configured size of all existing swap files on the datastore to avoid an Out-of space condition.
(Alternative) VM swap file locations Q&A – part 2
Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman
(Alternative) VM swap file locations Q&A
Lately I have received a couple of questions about Swap file placement. As I mentioned in the article “Storage DRS and alternative swap file locations”, it is possible to configure the hosts in the DRS cluster to place the virtual machine swapfiles on an alternative datastore. Here are the questions I received and my answer:
Question 1: Will placing a swap file on a local datastore increase my vMotion time?
Yes, as the destination ESXi host cannot connect to the local datastore, the file has to be placed on a datastore that is available for the new ESXi host running the incoming VM.Therefor the destination host creates a new swap file in its swap file destination. vMotion time will increase as a new file needs to be created on the local datastore of the destination host and swapped memory pages potentially need to be copied.
Question 2: Is the swap file an empty file during creation or is it zeroed out?
When a swap file is created an empty file equal to the size of the virtual machine memory configuration. This file is empty and does not contain any zeros.
Please note that if the virtual machine is configured with a reservation than the swap file will be an empty file with the size of (virtual machine memory configuration – VM memory reservation). For example, if a 4GB virtual machine is configured with a 1024MB memory reservation, the size of the swap file will be 3072MB.
Question 3: What happens with the swap file placed on a non-shared datastore during vMotion?
During vMotion, the destination host creates a new swap file in its swap file destination. If the source swap file contains swapped out pages, only those pages are copied over to the destination host.
Question 4: What happens if I have an inconsistent ESXi host configuration of local swap file locations in a DRS cluster?
When selecting the option “Datastore specified by host”, an alternative swap file location has to be configured on each host separately. If one host is not configured with an alternative location, then the swap file will be stored in the working directory of the virtual machine. When that virtual machine is moved to another host configured with an alternative swap file location, the contents of the swap file is copied over to the specified location, regardless of the fact that the destination host can connect to the swap file in the working directory.
Question 5: What happens if my specified alternative swap file location is full and I want to power-on a virtual machine?
If the alternative datastore does not have enough space, the VMkernel tries to store the VM swap file in the working directory of the virtual machine. You need to ensure enough free space is available in the working directory otherwise the VM not allowed to be powered up.
Question 6: Should I place my swap file on a replicated datastore?
Its recommended placing the swap file on a datastore that has replication disabled. Replication of files increases vMotion time. When moving the contents of a swap file into a replicated datastore, the swap file and its contents need to replicated to the replica datastore as well. If synchronous replication is used, each block/page copied from the source datastore to the destination datastore, it needs to wait until the destination datastore receives an acknowledgement from its replication partner datastore (the replica datastore).
Question 7: Should I place my swap file on a datastore with snapshots enabled?
To save storage space and design for the most efficient use of storage capacity, it is recommended not to place the swap files on a datastore with snapshot enabled. The VMkernel places pages in a swap file if it’s there is memory pressure, either by an overcommitted state or the virtual machine requires more memory than it’s configured memory limit. It only retrieves memory from the swap file if it requires that particular page. The VMkernel will not transfer all the pages out of the swap file if the memory pressure on the host is resolved. It keeps unused swapped out pages in the swap file, as transferring unused pages is nothing more than creating system overhead. This means that a swapped out page could stay there as long as possible until the virtual machine is powered-off. Having the possibility of snapshotting idle and unused pages on storage could reduce the pools capacity used for snapshotting useful data.
Question 8: Should I place my swap file on a datastore on a thin provisioned datastore (LUN)?
This is a tricky one and it all depends on the maturity of your management processes. As long as thin provisioned datastore is adequately monitored for utilization and free space and controls are in place that ensures sufficient free space is available to cope with bursts of memory use, than it could be a viable possibility.
The reason for the hesitation is the impact a thin provisioned datastores has on the continuity of the virtual machine.
Placement of swap files by VMkernel is done at the logical level. The VMkernel determines if the swap file can be placed on the datastore based on its file size. That means that it checks the free space of a datastore reported by the ESX host, not the storage array. However the datastore could exist in a heavily over-provisioned datapool.
Once the swap file is created the VMkernel assumes it can store pages in the entire swap file, see question 2 for swap file calculation. As the swap file is just an empty file until the VMkernel places a page in the swap file, the swap file itself takes up a little space on the thin disk datastore. Now this can go on for a long time and nothing will happen. But what if the total reservation consumed, memory overcommit-level and workload spikes on the ESXi host layer are not correlated with the available space in the thin provisioning storage pool? Understand how much space the datastore could possibly obtain and calculate the maximum configured size of all existing swap files on the datastore to avoid an Out-of space condition.
(Alternative) VM swap file locations Q&A – part 2
Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman
VMware feature request
During presentations I always stress to submit a feature request if you have an idea how to enhance the product or if you feel you are missing a vital product feature. VMware is very interested to hear how the products can be enhanced and improved.
Although it’s always good to talk to your local VMware rep or your favorite VMware blogger, submitted feedback might not reach the correct person on time. In order to have the feedback routed to the correct person using the shortest path available, it is best to submit a feature request via the VMware website.
Unfortunately VMware.com doesn’t have an action button on the front-page, therefor I thought it might be a good idea to publish a short article with the link included. If you have any feedback go to the feature request page and submit your comments.
Thanks!
VAAI hw offload and Storage vMotion between two Storage Arrays
Recently I received a question about migrating virtual machines with Storage vMotion between two Storage Arrays. More specifically if VAAI is leveraged by Storage vMotion in this process. Unfortunately VAAI is an internal array based feature, the Clone Blocks VAAI feature Storage vMotion leverages is only used to copy and migrate data within the same physical array.
Datamovers
How does Storage vMotion work between two arrays? Storage vMotion uses a VMkernel component called the datamover. This component is moves the blocks from the source to the destination datastore, to be more precise; it handles the read and write blocks I/O from and to the source and destination datastores.
The VMkernel used in vSphere 4.1 and up contains 2 different datamovers, software datamovers (FSDM and FS3DM) and a hardware offloading datamover (FS3DM-hardware offloading). The most efficient datamover is the FS3DM-hardware offload, followed by the FS3DM and as last the legacy datamover FSDM. FS3DM operates at kernel level, while the FSDM operates at the application level, the shorter the communication path the faster the operation. In essence Storage vMotion is travelling up to the stack of datamovers, trying the most efficient first, before reverting to a less optimal choice. To get an idea of difference in performance, please read the article “Storage vMotion performance difference” on Yellow-Bricks.com
Traversing the datamover stack
When a data movement operation is invoked (I.E. Storage vMotion) and the VAAI hardware offload operation is enabled, the data mover will first attempt to use the hardware offload. If the hardware offload operation fails, the data mover reverts to the software datamovers, first FS3DM, then FSDM. As you are migrating between arrays, hardware offloading will fail and the VMkernel selects a software datamover FS3DM. If the block-sizes of the datastore are not identical, then Storage vMotion has to revert to the FSDM datamover. If you are migrating data between NFS datastores than Storage vMotion immediately revert to the FSDM datamover.
Impact on Storage DRS datastore cluster design
Keep this in mind when designing Storage DRS datastore clusters. Storage DRS does not keep historical data of storage vMotion lead times, and thus it cannot incorporate these metrics when generating migration recommendations. Although no performance loss will occur within the virtual machine, migrating between arrays can create overhead on the supporting infrastructure. If possible design your datastores to contain datastores within the same array and use identical blocksizes (if VMFS is used)