What grade Flash to pick for a POC and test environment?

Do I need to buy a specific grade SSD for my test environment or can I buy the cheapest SSDs? Do I need to buy enterprise grade SSDs for my POC? They last longer, but why should I bother for a POC? Do we go for Consumer grade or Enterprise grade flash devices? All valid questions that typically arise after a presentation about PernixData FVP, but I can imagine Duncan and Cormac receive the same when talking about VSAN.

Enterprise flash devices are known for their higher endurance rate, their data protection features and their increased speed compared to consumer grade flash devices and although these features are very nice to have, they aren’t the most important features to have when testing flash performance.

The most interesting features of enterprise flash devices are Wear Levelling (To reduce hot spots), Spare Capacity, Write Amplification Avoidance, Garbage Collection Efficiency and Wear-Out Prediction Management. These lead to I/O consistency. And I/O consistency is the Holy Grail for test, POC and production workloads.

Spare capacity
One of the main differentiators of enterprise grade disks is spare capacity. The controller and disk use this spare capacity to reduce write amplification. Write amplification occurs when the drive runs out of pages to write data. In order to write data, the page needs to be in an erased state. Meaning that if (stale) data is present in that page, the drive needs to erase it first before writing (fresh) data. The challenge with flash is that the controller can erase per block, a collection of pages. It might happen that the block contains pages that have still valid data. That means that this data needs to be written somewhere else before the controller can delete the block of pages. That sequence is called write amplification and that is something you want to keep to a minimum.

To solve this, flash vendors have over provisioned the device with flash cells. The more technical accurate term is “Reduced LBA access”. For example, the Intel DC S3700 flash disk series comes standard with 25 – 30% more flash capacity. This capacity is assigned to the controller and uses this to manage background operations such as garbage collection, NAND disturb rules or erase blocks. Now the interesting part is how the controller handle management operations. Enterprise controllers contain far more advanced algorithms to reduce the wear of blocks by reducing the movement of data, understanding which data is valid and which is stale (TRIM) and how fast and efficient it can redefine logical to physical LBAs after moving valid data to erase the stale data. Please read this article to learn more about write amplification.

Consumer grade flash
Consumer grade flash devices lack in these areas, most of them have TRIM support, but how advanced is that algorithm? Most of them can move data around, but how fast and intelligent is the controller? But the biggest question is how many spare pages does it have to reduce the write amplification. In worst case scenarios, and that usually happens when running test, the disk is saturated and the data keeps on pouring in. Typically a consumer grade has 7 % spare capacity and it will not use all that space for data movement. Due to the limited space available, the drive will allocate new blocks from its spare area first, then eventually using up its spare capacity to end up doing a read-modify-write operation. At that point the controller and the device are fully engaged with household chores instead of providing service to the infrastructure. It’s almost like the disk is playing a sliding puzzle


Anandtech.com performed similar tests and witnessed similar behaviour, the publish their results in the article “Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs” An excellent read, highly recommended. In this test they used the default spare capacity and run some test. In the test they used one of the best consumer grade SSD device, the Samsung 840 PRO. In this test with a single block size (which is an anomaly in real-life workload characteristics) the results are all over the place.

840pro- default spare capacity

Seeing a scattered plot with results ranging between 200 and 100.000 IOPS is not a good base platform to understand and evaluate a new software platform.

The moment they reduced the user-addressable space (reformat the file system to use less space) the performance goes up and is far more stable. Almost ever result is in the 25.000 to 30.000 range.

840pro-25 spare capacity

Please note that both VSAN as FVP manage the flash devices at their own level, you cannot format the disk to create additional spare capacity.

Latency test show exactly the same. I’ve tested some enterprise disks and consumer grade disks and the results were interesting to say the least. The consumer grade drive performance charts were not as pretty. The virtual machine running the read test was the only workload hitting the drive and yet the drive had trouble providing steady response times.


I swapped the consumer grade for an enterprise disk and ran the same test again, this time the latency was consistent, providing predictable application response time.


Why you want to use enterprise devices:
When testing and evaluate new software, even a new architecture, the last thing you want to do is start an investigation why performance is so erratic. Is it the software? Is it the disk, the test pattern, or is the application acting weird? You need to have stable, consistent and predictable hardware layer that acts as the foundation for the new architecture. You need a stable environment that allows you to baseline the performance of the device and you can understand the characteristics of the workload, the software performance and the overall benefit of this new platform in your architecture.

Enterprise flash devices provide these abilities and when doing a price comparison between enterprise and consumer grade the difference is not that extreme. In Europe you can get an Intel DC S3700 100GB for 200 Euros. Amazon is offering the 200 GB for under 500 US dollars. 100-200GB is more than enough for testing purposes.

vSphere Design Pocketbook v2 – Call for entries deadline extended

This weekend a lot of entries were submitted and we received even more pleads for an extension of the deadline. Therefor we have extended the deadline to saturday 21st of June.

Call for entries
Please provide your blog post in Word or PDF format. If you use any diagrams in your article, please provide them separately in 300 DPI PNG or high quality JPEG format. (Guidelines for designing diagrams can be found here)

It’s about you, so please provide a short bio of yourself with your blog URL and twitter handle, if preferred you can send a headshot as well.

You can email the content to Pocketbook@pernixdata.com

Please note that you don’t have to submit an existing blog article. If you want to be included in the book, just submit your thoughts about vSphere design. More info about the book: vSphere Design Pocketbook v2 – the blog edition – Call for entries


More than 3000 copies were handed out last year. Become a part of the platform that allows the community to show their skill set. Submit your article before the 21st of June.

vSphere Design Pocketbook v2 – the blog edition – Call for entries

Last year’s vSphere Design Pocketbook – “Tweet sized Design Consideration for Your Software-Defined Datacenter” was a big hit. Over 3000 copies were given away since last VMworld and I don’t even know how many copies are downloaded.

We knew other community members had loads of advice to share and from that idea the vSphere Design Pocketbook was born. And now it is time for a successor!

The Blog edition
The design considerations featured in the first book are in tweet-sized format, limited to 200 characters. This edition will expand beyond this limit and allows conveying your thoughts up to the length of a blog article. You can select either an existing content such as a published article or create a new one.

  • Is there a maximum length? Not exactly, use the words necessary to describe your design consideration efficiently. If necessary we will ask you to condense your material.
  • Can I use diagrams? Absolutely! Make sure you provide the diagrams and screenshots that can be printed. At least 220DPI, preferably 300DPI. Looking for guidelines on making great diagrams? Please read this article
  • Will I be credited? We will use the same format as the first book. Your name, twitter handle and, if available, your blog url will be listed. In line with most blog sites, you are requested to provide a short bio of 3 sentences that will be printed along side the article.
  • Do I need to be a blogger? You are not required to have a blog, nor be a vExpert or VCDX. There are no requirements for submitting your design decision articles.

We are looking for content in the following categories:

  • Host design
  • Cluster design
  • vCenter design
  • Networking and Security design
  • Storage design
  • Generic design considerations – “Words of Wisdom”.

To avoid saturation we do not allow more than a total of your top three articles. For example, you can provide us with three design consideration articles for the Host category but you could also choose to provide one article for three different categories. Be aware that we rather see one excellent design consideration article than three mediocre ones.

Call for entries
Please provide your blog post in Word or PDF format. If you use any diagrams in your article, please provide them separately in 300 DPI PNG or high quality JPEG format. (Guidelines for designing diagrams can be found here)

It’s about you, so please provide a short bio of yourself with your blog URL and twitter handle, if preferred you can send a headshot as well.

You can email the content to Pocketbook@pernixdata.com

Call for entries close Saturday June 14th.

This book is free!

PernixData generously offered to print the book. If your design consideration is included in the book, you will receive a copy of the book. At the booth at VMworld PernixData will have a copy available for people who submitted a winning design consideration article. A limited number of books will be available for the community. More details will follow. After VMworld an E-book version of the book will be made publicly available.

Collateral benefit of an acceleration platform

Yesterday I visited a customer to review their experience of implementing FVP. They loved the fast response time and the incredible performance that server flash brings to the table. Placing flash resources in the host, as close to the application as possible, allows you to speed up the workloads you select. Reducing distance between the application and the storage device provides lower latencies and the performance of the flash device allows for great performance. But what is interesting is the “collateral benefit” that the FVP architecture provides to the entire architecture.

During the conversation the customer dropped their hero numbers on me. Hero numbers are the historical data points presented by the U.I. such as IOPS saved from the datastore and bandwidth saved. We like to call these Hero numbers as indicate the impact on the environment and they sure were impressive.

In one-week’s time FVP accelerated 1.2 Billion IOPS in their environment. (IOs saved from the Datastore)

1 Billion IOPS

Please note that these are business workloads, not Iometer workload tests. IO’s generated from Oracle and MS SQL databases. 3 hours later I received a new screenshot; it accumulated 24 Million more IOPS saved during that time. Indicating an average acceleration of 8 Million IOPS per hour.


That is 8 Million of I/O’s per hour served by server flash and not hitting the array. In total almost 60 TB of data did not traverse the storage area network allowing other workloads to roam freely through the storage network. Other workloads such as virtual machines or physical servers connected to the array or SAN. This reduction of I/O results in lower CPU utilization of the storage controllers, freeing up resources to become available for non-accelerated workloads.

60TB is the amount of read I/O’s saved by FVP hitting the storage area network and the array. When accelerating both reads and writes, we still send the write data to the array, as FVP is not a persistent storage layer (i.e. providing datastore capabilities). When the virtual machine is in Write back FVP try to destage (write uncommitted data to the storage system) as fast as possible. If the storage system is busy FVP destages uncommitted data at a rate the primary storage is comfortable of receiving data. Risk of data-loss is averted by storing multiple replicas on other hosts in the FVP Cluster allowing FVP to destage in a more uniform write pattern. Being able to time-release I/O’s results permits FVP to absorb workload spikes and convert them in write pattern more aligned with the performance capabilities of the entire storage area network

I captured a spiky OLTP workload to show this phenomenon. The workload generated 8800 IOPS. (the green line) The flash device absorbed these writes and completed the I/O instantly, allowing the user to continue generating results. Although the application exhibits a spiky workload pattern does not require FVP to mimic this workload behavior. Data is stored safely on multiple non-volatile devices, therefor the 8800 IOPS are send to the array in such a rate that this does not overwhelm the array. The purple line indicates the number of IOPS send to the array. The highest number of IOPS send to the array is in this example 3800 IOPS, 5000 less than the spike produced by the application.

absorbing writes

This behavior reduces the continuous stress on the storage area network and the array. It allows for customers to get more mileage out of their arrays as the array now becomes focused on providing capacity and data services primarily. When having accumulated enough data points over time, these data points can be used as input for your new array configuration. This generally results in a design comprised of a lower amount of spindles, resulting in advantages such as lower cost, smaller physical footprint and reduced thermal signature.

Being able to accelerate both read and write operations goes beyond improving that specific workload, but generating an overall improvement for the entire datacenter architecture.

Want to hear more about the customer, their use case and the benefits they are experiencing implementing FVP. Vote for VMworld Session: 2583 Case Study: Tata Steel Virtualizes Oracle Database, Gets Better Performance than Physical

Data acceleration, more than just a pretty flash device

Sometimes I get the question whether it would make sense to place a flash appliance on the network and use this medium to accelerate data. This pool of flash serves multiple workloads without disrupting any workload when adding it to the infrastructure. Justin Warren, a Storage Field Delegate recently came to the same conclusion. In my opinion this construction leads to an inferior spot solution that does not allow for full leverage of resources and loses a lot of possibilities to grow to a more evolved architecture providing performance where its needed when its needed. Let’s take a closer look what role software has in the act of accelerating data and why you need software to to do this at scale.

Accelerating data is more than adding a faster medium to solve your problem. Just adding a raw acceleration medium will just push out the moment you hit your new performance ceiling. Virtual Datacenters are extremely dynamic. Virtualization isn’t about consolidating workload on a smaller number of servers any more, it’s rapidly moving towards aligning IT to business strategies. Virtualization allows companies to respond to new demand on the fly, being able to rapidly deploy environments that cater to the wishes of the customer while still being able in control of distributing the limited amount of resources that are available.

And in order to do this properly one needs to have full control over the resource. Being able to manage and utilize the resources as efficiently as possible, you need to be able control the stack of resources with the same set of controls, the same granularity and preferably within a single pane of glass. Adding additional resources that require a different set of controls, use a different method of management and distribution of resources reduces efficiency and usually increases complexity. Minimizing time of management AND reducing human touch points is off essence. By using two distinct systems, – inside the hypervisor kernel and outside the hypervisor – chances are that they cannot be integrated into a single policy based management process. Meaning that manual labor needs to take place, which impacts overall lead times of deployment and agility of the services offered. Think availability of human resources, think level of expertise, and think permissions and access of multiple systems. Automation and policy-based management can help you avoid all these uncertainties and dependencies and control automation in a more orchestrated fashion. More and more signals are coming from within the industry that support an overall openness of APIs and frameworks, but unfortunately the industry is not there yet.

Control, integration and automation rely on a very important element and that is identity and in our case VM identity. You cannot distribute resources properly if you don’t know who is requesting the resource. You need to understand who that entity is, what its entitlement to the resource is and what its relative priority is amongst other workloads. When a workload is exiting the ESXi host, it usually is stripped of all its identity and becomes just a random stream of demand and utilization. Many tried to solve this by carving up resources and dedicate it directly to a higher entity. For example, disk groups assigned to a particular cluster, or placing a VM on a separate datastore to dispose all the resources available between the host and the datastore. But in reality this worked for a short amount of time, hogged resources, created a static entity in an architecture that excels when allowing algorithms distribute resource dynamically. In short it does not scale and typically prohibits a more mature method of IT service delivery.

Therefore it’s key to keep the intelligence as close to the application as possible. Harvest the true power of software intelligence. Retain identity of workloads that allows you to distribute resources whenever it’s needed with the correct priority and availability of resources. By using VM identity you can apply your IT services by creating a set of policies, for example RPO and resource availability. Just by selecting the correct availability profile. This is the true power of software! Software can utilize the available resources in the most efficient way. I’ve seen it for example with FVP F-squared, where the performance of the flash device increased by using a better, more intelligent way of presenting the workload of the VMs to the flash resource. Better hardware performance by leveraging VM identity, control of resources and analytics all done in the same domain of control.

You can find the power of software in other industries as well. If you have the chance to talk to an software engineer of any MotoGP racing team that ask him what he can do in his controlled environment with software. By understanding the workload for a particular application (track) they can control the suspension system, throttle control, engine behavior all on the position of the bike on the track, setting up the bike in the most optimal way for the upcoming corner. And its not just A corner, they understand exactly what corner is coming and what the impact it has on the bike and will adjust accordingly. Whether they are allowed to use this in a race is a different debate, but it demonstrates the true power of software, workload analytics, and identity in a controlled system.

That type of analytics and power of resource distribution is exactly what you want for your applications. And the best way to do it is to retain VM identity. Use analytics, distributed resources management, advanced QOS to align the availability of high performance resources to the workload demand. Do it in such a way that it requires a minimal amount of clicks to configure and manage the system and it is my belief that the only place to do this is within the hypervisor kernel. Inside the kernel where multiple schedulers operate in harmony, understand, retain and respect VM identity while being on top of the resource and close to workload as possible.

Adding acceleration resources outside the kernel will not provide you this ability and you have to wonder what you solve with that particular model. vSphere DRS maintenance mode allows migration of workloads seamlessly, transparent and non-disruptively to other hosts in the cluster, not impacting workload in any form and manner. Providing you the ability to install acceleration resources without impacting your IT service level. And if you exercise proper IT hygiene, before connecting any device to an ESXi host, it is recommended (dare I say best practice) to put the host in maintenance mode anyway. Resulting in same host and workload migration behavior.