What’s new in PernixData FVP 2.0 - Distributed Fault Tolerant Memory

PernixData FVP 2.0 allows the use of multiple acceleration resources. In FVP 1.x various types of flash devices could be leveraged to accelerate virtual machine I/O operations. FVP 2.0 introduces the support of server side RAM.

RAM bound world

With recent chipset advances and software developments it is now possible to support up terabytes of memory in a vSphere host. At VMworld VMware announced 6TB memory support for vSphere 6.0 and recently announced the same support for vSphere 5.5 update 2. Intel’s newest processors supports up to 1536GB memory support per CPU, allowing a 4 four-way server to easily reach the maximum supported memory by vSphere.

But what do you do with all this memory? As of now, you can use memory provided by the virtual infrastructure to accelerate virtual machine I/O. Other application vendors and Independent Software Vendors (ISV) are leveraging these massive amounts of memory as well, although they requirements impact IT operations and services.

Figurexxx-memory-pyramid

It starts at the top, applications can leverage vast amounts of memory to accelerate data however the user needs to change the application and implementing this is not typically considered a walk in the park. ISV’s caught up on this trend and did the heavy lifting for their user base, however you still need to run these specific apps to operationalize memory performance for storage. Distributed Fault Tolerant Memory (DFTM) allows every application in the virtualized datacenter to benefit from incredible storage performance with no operational or management overhead. Think of the introduction of DFTM as similar to the introduction of vSphere High Availability. You either had application level HA capabilities or clustering services such as Veritas or Microsoft Clustering Services. HA provided fail over capabilities to every virtual machine, every application the moment you configured a simple vSphere cluster service.

Scaling capabilities
DFTM rests on the pillars that FVP provides, a clustered, fault tolerant platform that scale out performances independent from storage capacity. DFTM allows for seamless hot-add and hot-shrink the FVP cluster with RAM resources. When more acceleration resources are required, just add more RAM to the FVP cluster. If RAM is needed for memory resources or you have other plans with that memory, just shrink the amount of host RAM provided to the FVP cluster. Host memory now becomes a multipurpose resource, providing virtual machine compute memory or I/O acceleration for virtual machines. Its up to you to decide what role it performs. When the virtual datacenter needs to run new virtual machines, add new hosts to the cluster and assign a portion of host memory to the FVP cluster to scale out storage performance as well.

Fault Tolerant write acceleration
FVP provides the same fault tolerance and data integrity guarantees for RAM as for Flash. FVP provides the ability to store replicas of write data to flash or RAM acceleration resources of other hosts in the cluster. FVP 2.0 provides the ability to align your FVP cluster configuration with your datacenter fault domain topology. For more information please read, “What’s new in PernixData FVP 2.0 – User Defined Fault Domains”.

Figurexxx-triple-FD-design

Clustered solution
FVP provides fault tolerant write acceleration based on clustered technology and provides failure handling. If a component, host or network failure occurs, FVP seamless transitions write policies to ensure data availability for new incoming data. It automatically writes uncommitted data that is present in the FVP cluster to the storage array, either the source host or any of the peer host does this if the source host experiences problems. If a failure occurs with a peer host, FVP automatically selects a new peer host in order to resume write acceleration services while safeguarding new incoming data. All of this without any user intervention required. For more information, please read “Fault Tolerant Write Acceleration”.

A clustered platform is also necessary to support the vSphere clustering services that virtualized datacenters leveraged for many years now. Both DRS and HA are fully supported. FVP remote access allows virtual machine mobility, Data is accessible to virtual machines regardless the host it resides on. For more information please read “PernixData FVP Remote Flash Access”. During a host failure, FVP ensures all uncommitted data is written to the storage array before allowing HA to restart the virtual machine.

Ease of configuration
Besides the incredible performance benefits, the ease of configuration is a very strong point when you deciding between flash or RAM as an acceleration resource. Memory is as close to the CPU as possible. No moving parts, no third party storage controller driver, no specific configuration such as RAID or cache structures. Just install FVP, assign the amount of memory and you are in business. This reduction of moving parts and the close proximity of RAM to the flash allows for an extreme consistent and predictable performance. This results incredible amounts of bandwidth, low latency and high IO performance. The following screenshot are of an SQL DB server, notice the green flat line at the bottom, that’s the network and VM observed latency.

Screen Shot 2014-10-02 at 12.33.14

The I/O latency of RAM was 20 microsecond, the network latency of 270 microsecond was clearly the element that “slowed it down”. With some overhead incurred by the kernel the application experienced a stable and predictable latency of 320 microseconds. I zoomed in to investigate any possible fluctuations but the VM Observed latency remained constant.

Screen Shot 2014-10-02 at 12.35.10

Blue line: VM observed latency
Green line: Network latency
Yellow line: RAM latency

The network latency is incurred due to writing the data safely to another host in the cluster. Writes are done in a synchronous matter, meaning that the source host needs to receive acknowledgements from both resources before completing the I/O to the application.

This means that with DFTM you can now virtualize the most resource intensive applications with RAM providing fault tolerant storage performance. A great example is SAP-HANA. Recently I wrote an article on the storage requirements of SAP-HANA environments. Although SAP-HANA is an in-memory database platform it’s recommended to use fast storage resources, such as flash to provide performance for log operations. Logs have to be written outside the volatile memory structure to provide ACID (Atomicity, Consistency, Isolation, Durability) guarantees for the database. By using FVP DFTM, all data (DB & Logs) reside in memory and have identical performance levels while leveraging Fault tolerance write acceleration to guarantee ACID requirements. And due to the support of mobility, SAP-HANA or similar application landscapes are now free to roam freely in the vSphere cluster, breaking down the last silo’s in your virtualized datacenter.

The next big thing

Channeling the wise words of Satyam Vaghani: The net effect of this development is that you are able to get predictable and persistent microsecond storage performance. With new developments popping up in the industry every day, it is not weird to wonder when we will hit nano second latencies. When the industry is faced with the possibility of these types of speeds, we as PernixData belief that we can absolutely and fundamentally change what applications expect out of storage infrastructure. Applications used to expect to that storage platforms provided performance in the millisecond levels and use to give up improving their code as storage platforms were the bottlenecks. For the first time ever storage performance is not the bottleneck, and for the first time ever extremely fast storage is affordable with FVP and server side acceleration resources. Even an SMB-class platform can now have a million IOPS and super low latency if they want to. Now the real question for the next step becomes, if you can make a virtualized datacenter have a millions of IOPS at microsecond latency levels what would you do with that power? What new type of application will you develop; what new use cases would be possible with all that power?

We at PernixData belief that if we can change the core assumption around the storage system and the way it performs, then we could see a new revolution in terms of application development and the way application actually use infrastructures. And we think that revolution is going to be very very exciting.

Article in Japanese: PernixData 2.0の新機能 ー 分散耐障害性メモリ(Distributed Fault Tolerant Memory - DFTM)

Frank Denneman

Follow me on Twitter, visit the facebook fanpage of FrankDenneman.nl, or add me to your Linkedin network