Let Cloudphysics help rid yourself of Heartbleed

Unfortunately the Open SSL Heartbleed bug (CVE-2014-0224) is present in the ESXi and vCenter 5.5 builds. VMware responded by incorporating a patch to solve the OpenSSL vulnerability in the OpenSSL 1.0.1 library. For more info about the ESXI 5.5 patch read KB 2076665, VMware issued two releases for vCenter 5.5, read KB 2076692.

Unfortunately some NFS environments experienced connection loss after applying the ESXi 5.5 patch, VMware responded by releasing patch 2077360 and more recently vCenter update 1b. The coverage on the NFS problems and the amount of ESX and vCenter update releases to fix a bunch of problems may left organizations in the dark whether they patched the Heartbleed vulnerability. Cloudphysics released a free Heartbleed analytic card in their card store that helps identify which hosts in your environment are unprotected.

Check out the recent article of Cloudphysics CTO, Irfan Ahmad about their recently released Heartbleed analytic package. I would recommend to run the card and rid yourself of this nasty bug.

Stop wasting your Storage Controller CPU cycles

Typically when dealing with storage performance problems, the first questions asked are what type of disks? What speed? what protocol? However your problem might be in the first port of call of your storage array, the storage controller!

When reviewing storage controller configurations of the most favourite storage arrays, one thing stood out to me and that is the absence of CPU specs. Storage controllers of the storage array are just plain simple servers, equipped with a bunch of I/O ports that establish communication with the back end disks and provide a front-end interface to communicate with the attached hosts. The storage controllers run proprietary software providing data services and specific storage features. And providing data services and running the software requires CPU power! After digging some more, I discovered that most storage controllers are equipped with two CPUs ranging from quad core to eight core. Sure there are some exceptions but lets stick to the most common configurations. This means that the typical enterprise storage array is equipped with 16 to 32 cores in total as they come with two storage controllers. 16 to 32 cores, thats it! What are these storage controller CPU used for? Today’s storage controller activity and responsibility:

  • Setting up and maintaining data paths.
  • Mirror writes for write-back cache between the storage controllers for redundancy and data availability.
  • Data movement and data integrity.
  • Maintaining RAID levels and calculating & writing parity data.
  • Data services such as snapshots and replication.
  • Internal data saving services such as deduplication and compression.
  • Executing Multi-tiering algorithms and promoting and demoting data to appropriate tier level.
  • Running integrated management software providing management and monitoring functionality of the array.

Historically arrays were designed to provide centralised data storage to a handful of servers. I/O performance was not the a pain point as many arrays easily delivered the request each single server could make. Then virtualisation hit the storage array. Many average I/O consuming grouped together on a single server, making that server, as George Crump would call it, an fire-breathing I/O demon. Mobility of virtual machines required an increased of connectivity, such that it was virtually impossible (no pun intended) to manually balance I/O load across the available storage controller I/O ports. The need for performance increased, resulting in larger number of disks managed by the storage controller, different types of disks, different speeds.

Virtualization-first policies pushed all types of servers and their I/O patterns on the storage array, introducing the need of new methods of software defined economics (did someone coin that term?) It became obvious that not every virtual machine requires the fastest resource 24/7, causing interest into multi-tiered solutions. Multi-tiering requires smart algorithms promoting and demoting data when it makes sense, providing the best performance to the workload when required while offering the best level of economics to the organisation. Snapshotting, dedup and other internal data saving services raised the need of CPU cycles even more. With the increase of I/O demand and introduction of new data services its not uncommon for virtualised datacenter to have over-utilised storage controllers.

Rethink current performance architecture
Server side acceleration platforms increases performance of virtual machines by leveraging faster resources (flash & memory) that are in closer proximity to the application than the storage array datastore. By keeping the data in the server layer, storage acceleration platforms, such as PernixData FVP, provides additional benefits to the storage area network and the storage array.

Impact of read acceleration on data movement
Hypervisors are seen as I/O blenders, sending the stream of random I/O of many virtual machines to the I/O ports of the storage controllers. Theses reads and writes must be processed, writes are committed to disks, data retrieved from disks to satisfy the read requests. All these operations consume CPU cycles. When accelerating writes, subsequents reads of that data are serviced from the flash device closest to the application. Typically data is read multiple times, decreasing latency for the application, but also unloading – relieving – the architecture from servicing that load. FVP provides metrics that show how many IO are saved from the datastore by servicing the data from flash. The screenshot below is taken after 6 weeks of accelerating database workloads. More info about this architecture here

8billionIOPS

The storage array does not have to service those 8 billion IOPS, but not only that, 520,63 TB did not traverse across the storage area network occupying the I/O ports of the storage controllers. That means that other workloads, maybe virtualised workload that hasn’t been accelerated yet, or external workload using the same array will not be affected by that I/O anymore. Less I/O hitting the inbound I/O queues on the storage controllers, allowing other I/O to flow more freely into the storage controller, lesser data to be retrieved by disks, lesser I/O going upstream from disk to I/O ports to begin its journey back from the storage controller all the way up to the application again. Saving copious amounts of CPU cycles allowing data services and other internal processes to take advantage of the available CPU cycles increasing the response of the array.

The screenshot is made by one of my favourite customers, but we are running a cool contest see which application the accelerated and how many IOPS other customers have saved.

Impact of Write acceleration on storage controller write cache (mirroring)
Almost all current storage arrays contain cache structures to speed up both reads and writes. Speeding up writes provide benefit to both the application and the array itself. Writing to NVRAM, where typically the write cache resides, is much faster than writing to (RAID-configured) disk structures allowing for faster write acknowledgements. As the acknowledgment is provided to the application, the array can “leisurely” structure the writes in the most optimum way to commit to data the backend disks.

To avoid a storage controller to be a single point of failure, redundancy is necessary to avoid data loss. Some vendors provide journaled and consistency points for redundancy purposes, most vendors mirror writes between the cache areas of both controllers. Mirrored write cache requires coordination between the controllers to ensure data coherency. Typically messaging is used via the backplane between controllers to ensure correctness. Mirroring data and messaging requires CPU cycles of both controllers.

Unfortunately even with these NVRAM structures, write problems seem to be persisting even today. No matter the size or speed of the NVRAM it’s the back-end disk capability to process writes that is being overwhelmed. Increasing cache sizes at the controller layer just delays the point at which write performance problems begins. Typically this occurs when there is a spike of write I/O. Remember, most ESX environments generate a constant flow of I/O’s adding a spike of I/Os is usually adding insult to injury to the already strained storage controller. Some controllers reserve a static portion for mirrored writes, forcing the controller to flush the data to disk when that portion begins to fill up. As the I/O keeps pouring in, the write cache has to wait to complete the incoming I/O until the current write data is committed to disk resulting in high latency for the application. Storage controller CPUs can be overwhelmed as the incoming I/O has to be mirrored between cache structures and coherency has to be guaranteed. Wasting precious CPU cycles on (a lot of) messaging between controllers instead of using it for other data services and features.

absorbing writes

FVP write back acknowledges the I/O once the data is written to the flash resources in the FVP cluster. FVP does not replace the datastore, therefore writes still have to be written to the storage array. The process of writing data to the array becomes transparent as the application already received the acknowledgement from FVP. This allows FVP to shape write patterns in such a way that are more suitable for the array to process. Typically FVP writes the data as fast as possible, but when the array is heavily utilised FVP time-releases the I/O’s. This results in a more uniform IO pattern. (Datastore write in the performance graph above). By flatting the spike, i.e. writing the same amount of IO’s over a longer period of time, the storage controller can handle the incoming stream much better. Avoiding forced cache flushes and CPU bottlenecks as a result.

FVP allows you to accelerate your workload, the acceleration of reads and writes reduces the amount of I/O’s hitting the array and the workload pattern. Customers who implemented FVP to accelerate their workloads experience significant changes of storage controllers utilisation benefitting external and non-accelerated workloads in the mix.

PernixData engineering videos: User Experience and User Interface engineers

Recently I visited PernixData HQ and bumped into Bryan Crowe, our resident User experience design engineer. Usually Bryan and I go over the details of FVP usability features and the feedback of customers. This time we decided to record a video and allow Bryan explain about the role of user experience in product development and the history of user experience design in enterprise software.

Unfortunately I did not have time enough to talk to Shyan our User Interface designer. Luckily Bryan stepped up and interviewed Shyan about FVP user interface design

Enjoy!

Aspiring analyst trend – my pov

It’s a trend that has been going for a while now, but with the acquisition of Fusion-IO by SANdisk the amount of articles with analytical views flared up. A lot of talk what the future of flash will look like and whether other mediums of flash such as Diablo’s UltraDimm would be a prevailing new technology. An interesting thought, and it seems this deal might be a catalyst of some sort, but contributing an industry-level change by merely comparing technical specs might be a bit too shortsighted. There are far more dynamics in play when it comes to making a new technology successful in the market than mere technology benefits itself. Here is my personal view on this matter and trend.

Although the announced new technology may be a better technology fit for a particular problem, the customer and their preferred technology partner need to be aware of it. This seems easily solved by using the power of social media. And it seems that social media (especially twitter and the blogosphere) is the center of the earth, unfortunately its not. There are a lot of virtualization users that do not social media and Internet on a daily basis. This impacts the possible penetration of new technology in today’s datacenter. How to spread the word, evangelize that this is the killer solution? To increase reach, startups need the support of technology partners and system integrators to spread the word. Partners generally provide the role as technical advisor, informing the customer of interesting technology that is relevant to their environments and needs.

This dynamic should not be under estimated, however partners are bombarded with new technology on a daily basis. Their technologists need to keep up with all the new developments and understand its place in their portfolio. Not every partner includes every new and exciting technology in its assortment. As a trusted advisor they are required to thoroughly understand the technology and its impact on the various environments that they are servicing. This requires time and time is the most precious resource many have nowadays. Juggling between advising customers, educating technical teams, submitting expenses ;) and learning new tech is a challenge. It is not realistic to expect that each architect, administrator or consultant can spend all the available hours on keeping tabs what is the newest of the newest, the hottest of the hottest.

On top of that let’s not ignore the commercial aspect of new technology and the possible displacement of the partners’ involvement in vendor programs. Obtaining a diamond, platinum or gold status has financial benefits for the partner. It receives higher discounts and possibly better support. As a partner is a commercial organization, it has to understand the financial ramifications a new product has on their organization, what problem will the product solve at the customer side and is there an equivalent solution in the portfolio than can help to reach the hardware vendors requirement to stay or advance into the next level of that particular program? This might be harsh or to some seems as critique valuing the commercial aspect above providing the best the industry has to offer. But sometimes it is what it is, we all (need to) earn money, this money does not fall from the sky, it has to be earned by the companies who employ us. And sometimes a suboptimal choice from a technology perspective still provides enough value for the customer.

There is the “believe & fan” factor as well. This is present at both the customer side as well as the partner side. There are a lot of techies that are true believers in a companies offering, technology platform and or message. Their previous choice of technology might have served them well and proven that “their” hardware vendor solved the problems that they faced throughout the years. This track record is hard to beat! Arguably this is the also the weakest part as many leading vendors dropped the ball on many customers and this could lead to disqualification of the vendor in the next technology acquisition projects. As you see it can go either way.

Where am I going with this? Is there a point to this “rant”? Well being a frequent visitor of the valley I’m privileged to meet a lot of smart people, from venture capitalist, CEO, CTO’s, engineers and community members. I’m lucky to have access to the full spectrum of industry participants and hear their opinions and views and one thing they all have in common, is that they all are excited by new technology and believe that development of new technology will increase many wonderful things in life. But most of all, nobody is sure that a particular tech will become the next big thing. Tech makes sense from a technical perspective, when reviewing tech focusing on the benefits and technical implications makes sense. But when it comes to the business side of things, we need to understand that there are far more complex dynamics at play. This can be commercial, political and emotional. As a technologist, partially school economic and let’s not forget armchair psychologist I recognize most sides, but I will leave the debate to people who are more equipped to handle this, usually they tend to be vague merely as they don’t have the insights to all the variables at play. Does that mean I condone these sorts of articles by my fellow bloggers, I’m most certainly do not, but what I do want to implore is to be open to all comments and discussion when you tend to publish your views and opinion.

What grade Flash to pick for a POC and test environment?

Do I need to buy a specific grade SSD for my test environment or can I buy the cheapest SSDs? Do I need to buy enterprise grade SSDs for my POC? They last longer, but why should I bother for a POC? Do we go for Consumer grade or Enterprise grade flash devices? All valid questions that typically arise after a presentation about PernixData FVP, but I can imagine Duncan and Cormac receive the same when talking about VSAN.

Enterprise flash devices are known for their higher endurance rate, their data protection features and their increased speed compared to consumer grade flash devices and although these features are very nice to have, they aren’t the most important features to have when testing flash performance.

The most interesting features of enterprise flash devices are Wear Levelling (To reduce hot spots), Spare Capacity, Write Amplification Avoidance, Garbage Collection Efficiency and Wear-Out Prediction Management. These lead to I/O consistency. And I/O consistency is the Holy Grail for test, POC and production workloads.

Spare capacity
One of the main differentiators of enterprise grade disks is spare capacity. The controller and disk use this spare capacity to reduce write amplification. Write amplification occurs when the drive runs out of pages to write data. In order to write data, the page needs to be in an erased state. Meaning that if (stale) data is present in that page, the drive needs to erase it first before writing (fresh) data. The challenge with flash is that the controller can erase per block, a collection of pages. It might happen that the block contains pages that have still valid data. That means that this data needs to be written somewhere else before the controller can delete the block of pages. That sequence is called write amplification and that is something you want to keep to a minimum.

To solve this, flash vendors have over provisioned the device with flash cells. The more technical accurate term is “Reduced LBA access”. For example, the Intel DC S3700 flash disk series comes standard with 25 – 30% more flash capacity. This capacity is assigned to the controller and uses this to manage background operations such as garbage collection, NAND disturb rules or erase blocks. Now the interesting part is how the controller handle management operations. Enterprise controllers contain far more advanced algorithms to reduce the wear of blocks by reducing the movement of data, understanding which data is valid and which is stale (TRIM) and how fast and efficient it can redefine logical to physical LBAs after moving valid data to erase the stale data. Please read this article to learn more about write amplification.

Consumer grade flash
Consumer grade flash devices lack in these areas, most of them have TRIM support, but how advanced is that algorithm? Most of them can move data around, but how fast and intelligent is the controller? But the biggest question is how many spare pages does it have to reduce the write amplification. In worst case scenarios, and that usually happens when running test, the disk is saturated and the data keeps on pouring in. Typically a consumer grade has 7 % spare capacity and it will not use all that space for data movement. Due to the limited space available, the drive will allocate new blocks from its spare area first, then eventually using up its spare capacity to end up doing a read-modify-write operation. At that point the controller and the device are fully engaged with household chores instead of providing service to the infrastructure. It’s almost like the disk is playing a sliding puzzle

sliding-puzzles-1

Anandtech.com performed similar tests and witnessed similar behaviour, the publish their results in the article “Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs” An excellent read, highly recommended. In this test they used the default spare capacity and run some test. In the test they used one of the best consumer grade SSD device, the Samsung 840 PRO. In this test with a single block size (which is an anomaly in real-life workload characteristics) the results are all over the place.

840pro- default spare capacity

Seeing a scattered plot with results ranging between 200 and 100.000 IOPS is not a good base platform to understand and evaluate a new software platform.

The moment they reduced the user-addressable space (reformat the file system to use less space) the performance goes up and is far more stable. Almost ever result is in the 25.000 to 30.000 range.

840pro-25 spare capacity

Please note that both VSAN as FVP manage the flash devices at their own level, you cannot format the disk to create additional spare capacity.

Latency test show exactly the same. I’ve tested some enterprise disks and consumer grade disks and the results were interesting to say the least. The consumer grade drive performance charts were not as pretty. The virtual machine running the read test was the only workload hitting the drive and yet the drive had trouble providing steady response times.

Consumer-Latency

I swapped the consumer grade for an enterprise disk and ran the same test again, this time the latency was consistent, providing predictable application response time.

Enterprise-latency

Why you want to use enterprise devices:
When testing and evaluate new software, even a new architecture, the last thing you want to do is start an investigation why performance is so erratic. Is it the software? Is it the disk, the test pattern, or is the application acting weird? You need to have stable, consistent and predictable hardware layer that acts as the foundation for the new architecture. You need a stable environment that allows you to baseline the performance of the device and you can understand the characteristics of the workload, the software performance and the overall benefit of this new platform in your architecture.

Enterprise flash devices provide these abilities and when doing a price comparison between enterprise and consumer grade the difference is not that extreme. In Europe you can get an Intel DC S3700 100GB for 200 Euros. Amazon is offering the 200 GB for under 500 US dollars. 100-200GB is more than enough for testing purposes.