Why accurate IO block size frequency and workload distribution metrics are so valuable

When discussing or comparing storage architectures IOPS, latency and throughput are the key metrics of differentiation. Unfortunately the IOPS metric has become to go-to-metric for distinctiveness for arrays with its peers. And to be honest it's an addictive number, it shows progression of the industry. Where we use to be super excited with a system that does 3000 IOPS back in the 90's, we now see new SSD devices providing us with 250.000 IOPS today.

The IOPS metric became the unit prefix of universal performance scoring chart of storage systems. However we all know that the IOPS metric by itself is a hollow proclamation. In order to properly understand the characteristics of the system we need to add block sizes to the equation. And I believe that an accurate measurement of the IO block size frequency and the read/write distribution are valuable at any level of the IT organization, from admin to CIO. Each of course having different requirements of the level of detail.

Satisfying the SLA
Understanding the workload profile and how this matches with the infrastructure is a question that occupies the mind of many different IT teams today. Where it used to be focus of the storage team, virtualization admins and architects deal with this as well. Having insight in change of workload profiles allows upper level management to recognize trending and possible align their infrastructure and services to cope with this direction. Even more so development teams start to interact with the infrastructure teams to understand the potential of the infrastructure. This information allows the developer teams to adjust the application to the performance characteristics of the production environment. Today every team is focussed on aligning the IT infrastructure with the business requirements. By having accurate analysis of characteristics of the current workloads and the insight on how potential workloads will behave in the environment it creates invaluable information regarding SLA operations.

Having the right data for the customer conversation
One of the biggest concerns CTO and CIO's have is the inability of their teams to generate the correct data that allows them to have the proper conversation with their customers. For example this happens when a high profile project transitions from the development stage to the production stage. Workload intensity changes, users starting to use the new platform, no more synthetic tests and more that often a change in infrastructure happens. Most test/dev environments differ from production level infrastructure. Furthermore production environments have to cope with load synchronicity and load correlation generated by other workloads which typically do not occur on a test/dev environment that is running the high profile workload in the last stage of the development. A classic conversation occurs where the capacity of functioning of the production environment is being questioned. At these times you need the data to perform a proper gap analysis of the previous state and the current state of performance. Where does it differ from the previous environment? Having the correct data at your disposal allows you to understand which platform is suitable for your new application, or be conscious of the direction needed to align the infrastructure with this new workload or service.

Procurement process of storage systems
One challenge that administrators and architects typically face is generating the correct representation of current workload inside their datacenter. This data is key during the procurement of a new storage infrastructure. The storage vendor request the customer to provide workload characteristics and performance requirements in order to be able to meet the demand of the customer. In addition future growth is added to ensure that the performance will remain sufficient throughout its amortization period. I've seen countless of cases where a dispute is raised on incorrect sizing of the array as its underperforming. Typically it leads down to a situation in which it is determined that the incorrect data is the culprit of incorrect configuration of the storage solution. A costly situation which can be avoided if the environment is properly analyzed.

PernixData Architect
Interestingly enough when customers understood the benefit of PernixData FVP for the workload and the ability to create storage platforms that were previously impossible to create the next question was which workload should we start with? This was the genesis of PernixData architect. By accurately providing analytics of the workload and the infrastructure behavior it creates insights never seen before. By leveraging the context rich hypervisor Architect is able to provide insights at every level of the virtual infrastructure. What I absolutely love about Architect is that its build upon the progressive disclosure design vision, where the system helps to maintain the focus by providing the essential level of detail. It moves from abstract to specific, such as cluster-level to specific VM workload. It provides the necessary level of information at the right time. Let's take a closer look on how Architect presents the IO block size frequency and Read/write distribution metrics.

Please note that PernixData Architect is a stand-alone product and is able to analyse any storage array without the requirement of PernixData FVP. The following screenshots are PernixData Architect connected to an AFA array.

Cluster view
The screenshot displayed shows the workload summary at cluster level. It shows the summary of read write distribution of all the virtual machines running on that vSphere cluster. The blue bar displays the summary of the I/O block size frequency of all the IO's generated in that vSphere cluster.
When hovering over the bar, a callout is presented with a more detailed view of the particular block size. In this case 17.2% of all the IO blocks are smaller than 4K. Why this is useful information and why blocks smaller than 4K are detrimental to your virtual machine performance and storage array overhead are covered in a future blog article.

01-Cluster-workload summary-hover

IO Frequency view
When selecting the IO Frequency view, a more detailed view is provided by Architect. Distinction is made between read and write operations. The different shades of green indicate the frequency of the block. A hovering action reveals a more detailed percentage of the overall I/O operations of that specific type.

02-Cluster-IO-Frequency-Hover

Architects allows the user to fine-tuning the timeframe as well. Allowing admins to drill down to a specific time frame, while longer periods of time are interesting to generate reports that allow for trend analysis.

03-Timeframe

These metrics are available at multiple levels of the virtual infrastructure. This allows for targeted exploration of virtual machine behavior. The following screenshot is of a virtual machine running SQL. The same level of detail is available if the administrator choose to access it.

04-VM level

Architect provides continuous insights in the workload of the virtual machines and the behavior of the virtual infrastructure. Accurate metrics such as read/write distribution and I/O block size frequency are instrumental to a successful procurement process of storage infrastructure. It allows for proper gap analysis and workload demand analysis to understand the impact on managing and maintaining SLA's. Architect provides detailed performance analysis of the popular storage metrics, as shown in the article Maximizing performance with the correct drivers - Intel PCIe NVMe driver. These metrics allow for a further analysis of infrastructure behavior, this can be your current environment or during the Proof-of-Concept phase of your potential new Storage Arrays. Stay tuned for more articles about PernixData Architect.

Frank Denneman

Follow me on Twitter, visit the facebook fanpage of FrankDenneman.nl, or add me to your Linkedin network