FVP 2.5 Server side storage intelligence in the post flash era

Yesterday we (PernixData) released the new version of the FVP Platform. FVP 2.5 continues to improve the feature set of the storage acceleration software that allows datacenters to build storage architectures that are simply not achievable with the traditional storage array architectures.

Server Side Storage Intelligence
FVP integrates in the VMware ESXi Hypervisor. This allows FVP to integrate state of the art acceleration resources as close to the applications as possible. FVP supports both flash technology and RAM technology as an acceleration resource. Incorporating acceleration resources into the compute layer, allows architects to scale out the platform unprecedentedly while using use the latest and greatest technology.

One of the major challenges for the industry is to remain in lock step with new developments. Take flash technology as an example, recently flash devices were announced that can easily provide 250.000 real-world IOPS per device. Powerful, state-of-the-art, CPUs are required to utilize this power. Luckily server hardware keeps up with development allowing architects to incorporate the new technology in their designs. Typically development of storage controllers is lagging, utilizing an older generation CPU micro-architecture that is required process all the I/O while executing data services and managements tasks. The two controllers that are present in most monolithic storage array architectures are tasked to be each other’s failover, reducing the ability to utilize the finite CPU resources even further. As you can imagine this architecture is not the way forward, these systems are simply not the ideal place for placing loads of advanced storage resources. Flash vendors aren’t slowing down, the upcoming flash technology expected to be released end 2015 begin 2016 is just crazy. Looking 2 to 3 years ahead, the post-flash era will slowly start with non-volatile DIMMs and phase-change memory.

Unfortunately these new memory developments are a few years away, but in the mean time datacenters can leverage good old DDR memory to accelerate the workloads in the virtual datacenter. Aligning with the industry trend of in-compute memory, FVP 2.0 announced Distributed Fault Tolerant Memory.

Distributed Fault Tolerant Memory
Distributed Fault Tolerant Memory (DFTM) leverages the full feature set of the FVP platform, providing the same fault tolerance and data integrity guarantees for RAM as for flash technology. FVP provides the ability to store replicas of write data to flash or RAM acceleration resources of other hosts in the cluster. And FVP allows to aligning the FVP cluster configuration with the datacenter fault domain topology. DFTM allows for seamless hot-add and hot-shrink the FVP cluster with RAM resources. When more acceleration resources are required, just add more RAM to the FVP cluster. If RAM is needed for memory resources or you have other plans with that memory, just shrink the amount of host RAM provided to the FVP cluster. Host memory now becomes a multipurpose resource, providing virtual machine compute memory or I/O acceleration for virtual machines. Its up to you to decide what role it performs. When the virtual datacenter needs to run new virtual machines, add new hosts to the cluster and assign a portion of host memory to the FVP cluster to scale out storage performance as well.

DFTM-Z
FVP 2.5 announces DFTM-Z - adaptive memory compression – allowing more of the application’s working set to fit into the acceleration layer thereby increasing performance while offloading the lower parts of the storage architecture even more.

The interesting thing is that compression has always been a technology to expand capacity. The engineering team looked at this tech and figured out a way to use compression for performance instead. As you can imagine, it’s a privilege to work with a team of engineers that just look at the world in a different way.

When we talk about compression the obvious question is what about data compression and decompression latencies? Because FVP is a performance platform, DFTM-Z designed from the ground up to keep the impact to an absolute minimum.

Compression is interesting as the trade off of higher levels of compression is time and CPU cycles. You can choose to spend more time and CPU resources to obtain a higher level of compression or you can do lower compression by spending less CPU cycles and time. When it comes to storage systems, the choice is easy, compress everything and spend lots of time and CPU cycles. When choosing compression for performance the choices are completely different as time is performance. A adaptive compression engine is required that provides great compression without incurring overhead. And that’s where the compression engine of FVP shines. It provides high compression ratios at a low cost. Interesting enough it is our second-generation compression engine as our first compression engine was used in adaptive network compression with great success.

DFTM-Z is automatically enabled on the host when more than 20 GB of memory is contributed to the FVP cluster. If DTFM-Z is enabled, memory management assigns a region to store the compressed data. The compressed region scales with the total capacity of the host acceleration resources, optimizing the ratio of uncompressed and compressed region.

Compression is done in the background on colder data so that the impact on writes is minimal. When the working set of a virtual machine is larger than its allocated space, DFTM-Z compresses older data and stores it into the compressed region. This allows FVP to store relevant data while avoiding impacting active write operations. FVP uses a dynamic threshold that allows FVP to manage and use the acceleration resources to an optimum.

DFTM-Z-1

If an application requests data that is in the compressed region, the data is uncompressed and copied back to the uncompressed region. FVP manages data intelligently and focuses on data locality. When data is compressed FVP selects a whole chunk of contiguous memory. When there is an access of any portion of that chunk, FVP decompresses that whole chunk. As the application will typically read all that data due to locality of access, decompression latency is normalized. Keeping the overhead to an absolute minimum. Although there is some latency involved, it does not compare with the alternative and that is to drop the footprint and going all the way down to the array and its spindles to retrieve the data. We all know that’s not the way to go forward.

User-Interface
The FVP Cluster dashboard shows two numbers, traditionally it showed the capacity, and now with DTFM-Z it will show capacity with memory compression next to the capacity. In the screenshot below, the FVP cluster capacity is 128 GB, which is the real capacity provided by flash and RAM of the hosts in the cluster combined. The second number is the number of the capacity with memory compression, which turns out in this scenario as 133.91 GB. This is real flash capacity plus memory capacity after compression is enabled. This is the virtual capacity. This virtual capacity value is derived from the uncompressed region + (compressed region / average compression ratio).

User-Interface-DFTM-Z

Please note that the virtual capacity size is not displayed in real time but is refreshed often. The screenshot is of the Beta software, showing the virtual capacity at a decimal level accuracy. The soon to be released 2.5 software should display rounded-up numbers.

Frank Denneman

Follow me on Twitter, visit the facebook fanpage of FrankDenneman.nl, or add me to your Linkedin network