The hit rate metric, an elementary metric in your virtual toolbox
This article is part of the series "Evaluating an acceleration platform" and it is in the article “Acceleration layer architecture and I/O request” where I explain why certain tests are not usable when conducting performance test on an infrastructure equipped with an acceleration platform such as FVP.
When accelerating reads one of the key points is subsequent read access. To accelerate read performance data has to be present on the flash device. This can be done by storing write data directly on the flash device or by a “false” write. A false write occurs when the requested data is not available on the flash device. The data is retrieved from the datastore and simultaneously copied to the flash device when presenting the data to the application.
Once the data is stored on the flash device, any new I/O operation requesting this data can be serviced by the flash device. In case you wonder, Read After Writes (RAW) operations are pretty common operations in the application landscape.
When the requested data is present on the flash this is considered to be a cache hit and it is the hit rate metric that becomes a very important metric when evaluating or troubleshooting performance. Simply put, hit rates translate into latency. Due to the introduction of the new tier you can now read and write to multiple targets, targets with their own latency characteristics, latency of the flash device, latency of the datastore. You now need to determine which target is servicing the I/O requests.
I think providing this information is crucial when you are running an enterprise level acceleration platform. VMware vSphere Flash Read Cache provides this information via the esxcli commands (check out Duncan’s excellent post), FVP provides this information by using performance graphs.
Testing the theory
Let’s take a look at the performance graphs. In this test I use an application that provides a video service. In the first test the application is fetching a video per user request. As the video is new content for this server and therefor the video is fetched from the datastore. This translates into a hit rate of 0.
Reviewing the latency you will see that the effective latency is close to the datastore latency. The effective latency indicates the latency experienced by the virtual machine.
When another user is requesting the video again, data is serviced from the flash device as FVP performed false writes and copied the data while servicing the application. The hit rate is now 100%
The interesting part is the total effective latency metric as it is now closer to the local flash latency then the datastore latency. It doesn’t matter anymore that the datastore has its ups and downs when it comes to providing I/O. The application is getting a low latency and a very stable and predictable performance.
Performance from local flash helps to avoid and reduce moving parts as much as possible. No shared storage area network, no external workload pounding away on the storage controllers or disk spindle pools. But we live in 2013, close to 2014 now and VM mobility is ingrained in today’s datacenter operational procedures. FVP fully supports VM mobility by providing a seamless pool of flash resources. I/O can be serviced from a local flash device or a remote flash device.
Key behavior of FVP is that its providing data at request, avoiding wear and overhead. When the application requests data, FVP will service it from the flash device, remote or local. Plus, FVP aims to keep the flash footprint to the application as close as possible and therefore copies the data to the local flash device when retrieving it from the remote flash device. Ensuring that all subsequent reads are serviced from the flash device that is closest to the application. For more information please review the article “PernixData FVP Remote Flash Access”
The virtual machine is migrated to another host and at 09:54 AM the same video file is reviewed.
When this data is retrieved, the data is serviced from the flash pool and therefor the performance graph will register it as a cache hit. But there are differences in latency as the fetching of remote flash device traverses a longer path.
The effective latency is not following the local flash or the Datastore latency but is actually on top of the network latency. To make it more visible, I selected the custom performance and enabled the metrics network flash read and local flash read.
In this graph you will see that the remote flash device is servicing I/O requests. The flash device on the host the virtual machine was previously running on.
The video is played for a two minutes instead of the previous 5 or 6 minutes when it was on the old source host. The user plays the video file again, but now the same duration as it did on the first two tests.
Again the hit rate is a 100%, illustrating that I/O is serviced from flash devices. But which flash device? As FVP copies the data to the local flash device, the first three minutes of the video file are now serviced from local flash again. The remote flash metric indicates no activity for the first 3 minutes, but as the video is played beyond the previous three-minute mark, data is fetched again by FVP from the remote flash device
How does this translate in latency?
The test was started at 9:59 AM; effective latency was close to the latency of the local flash device, then after three minutes, the latency was close to the remote flash latency.
Please note that in this test I used an all-flash array with barely running load. Generally the overall latency is way off then the latency of the local and remote flash latency shown in this test.
Check hit rate before reviewing latency
Latency is one of the most predominant metrics when reviewing or troubleshooting performance. But when dealing with an acceleration platform it is key that you check the hit rate first. This allows you to determine quickly which layer is servicing the I/O requests. Hit rate should become a elementary metric in your virtual toolbox.