A lot of the initial attention goes to the write-back function when we demonstrate FVP and that’s expected since no other acceleration platform that is currently available is capable of doing this. However, when discussing remote flash access functionality in depth a lot of eyes lit up. Time to do the scale out move and publish an article about it.
What is remote flash access?
Remote flash access allows the virtual machine to access its current and previous flash footprint inside the flash cluster regardless of the position of the virtual machine or flash footprint. This means that if the virtual machine moves from host A to host B, FVP can retrieve the data stored on the local flash device of Host A if the virtual machine requests this data.
The reason is simple; accessing data on another flash device is quicker than going all the way down to the storage array and hitting the shared storage area network. The ability provides this functionality is far from simple. When dealing with a distributed system that allows virtual machine to move around is one thing, providing this on top of flash resources is another.
As previously described in my article: “Basic elements of FVP – Using own platform versus in-place file system” I cover the challenges of dealing with flash. I touch briefly on how to avoid burning through the program and erase cycles of a flash device to keep the flash device healthy allowing for predictable, persistent and stable performance. This means that multiple versions of the same data block can be present on a flash device. One block being valid while the others invalid and stale, keeping track of this on a local device is a challenge all by it self, now think scale! DRS allows for 32 hosts in a cluster and can potentially move the virtual machine from one host to another.
After a move, the virtual machine is creating a fresh footprint on the local flash device. During its operation the virtual machine can request a block that it has not requested on this host but it has on other hosts. Which block does it need to retrieve? The FVP cache coherence protocol knows and will provide the freshest copy to the virtual machine while ignoring the stale blocks. As an enthusiast for distributed algorithms, after seeing FVP work its magic I had to pick my jaw from the ground.
The ability to do fetch data on demand, fetch the data when the application requires it, avoids a lot of overhead. Typically the flash footprint can be in the orders of tens of gigabytes. We took a screenshot of the flash usage of a VM running SharePoint and had just been migrated by DRS. In total the VM had a flash footprint just over 91 GB inside the Flash cluster.
After the move the VM had a little over 1 GB on the local flash device while the remote flash device contained close to 90 GB. Was it necessary to copy this data over to the new local host, or would it be a waste of resources? Sending this amount of data across the network creates not only overhead on the network but it also creates flash wear and overhead on the compute level.
To avoid flash wear, data is not discarded if there is no pressure for flash resources. That means that FVP will not delete stale data just to merely free up space. If there is no need to free up space FVP does not touch valid data, its much more expensive to erase a flash block. See “Basic elements of FVP – Using own platform versus in-place file system”. That means that although the data is still valid, the application does not use it for its current or future operations. If that data would be copied along with the virtual machine when it migrated to another host, you would have burned through a lot of flash resources. First of all, the data would need to be written to the local flash device, this would burn through program and erase cycles. It might have created the situation where important data of another machine had to make room for unimportant data on the local flash device due to the pressure the data move created.
Network & CPU overhead
Think about the wasted bandwidth and the related increase of migration time it comes with. You might not care if it was moving a single virtual machine once in while, but DRS is evaluating the optimal load balance each 5 minutes. What if the move was so big it could not finish it in this timespan? How will it affect the load balance situation on the short term and on the long term? What about maintenance mode? I know some organizations who have a strict time window to perform maintenance, with the elongated migration time, I know some shops could barely finish getting the host into maintenance mode, before exiting again as the time-window closed. Having a host a week into maintenance mode just to be able to upgrade your host with the new software or change a setting a week later just because you lost time on migration of stale data is not my idea of agility and efficiency of resource use.
And lets not go into the wasted CPU cycles spend copying the data from one host to another, reducing the amount of available CPU resources that could have been used by other application workloads.
After 30 minutes we checked the SharePoint server again to see how many data was transferred. It appeared the local flash footprint increased form 1.15GB to 2.97GB. We measured this application during peak production hours, proving that pulling the 90 GB through the infrastructure would have been a waste of resource on both ESXi hosts.
Now you might wonder, what will happen to the 90GB? Resource management in FVP will track the use of this data and over time will let it go in order to make room for new data when necessary. Having data sitting idle on a flash device is not a bad thing, the way FVP is designed it intelligently groups data that allows the flash device to perform its management tasks in the most optimal way. If it needs to make room for new data, it will remove stale data in the most optimal way, reducing the amount of program and erase cycles by avoiding write-amplification.