Part 5 - Solving Cache pollution

The series “Virtual Datacenter scaling problems with traditional shared storage” goes into detail why it’s a challenge providing consistent performance with an interconnected (storage) stack of uncoordinated components. The primary element is the lack of a global control plane. There is no common language, that allows identifying consumers (virtual machines) and distributing and consumption of resources based on their priority.

Instead of waiting for a universal language to stitch everything together, one trend that is emerging in the industry is the software control plane. Intelligence is moving to the perimeter of the architecture, and resources are being commoditized. Moving storage resources directly into the compute layer allows not only for greater control, but also for more detailed insight into resource consumption and distribution. This model also allows the industry resolving hard to solve problems such as cache pollution.

Cache pollution
In general caches in storage controllers are used for reads and writes. Writes cache helps to reduce latency for write operations issued to the storage array.

Read cache helps to reduce latency for read operations by storing data. Data access speed is significantly improved if data is loaded in the cache compared to fetching it from spindle. A tremendous amount of research is done to focus of access patterns and its reference to locality. Applications tend to access the same data in a short amount of time. Due to the way applications write data, it is likely that data will be accessed near the data that was just referenced. This concept is called spatial locality. Therefor read caches tend to retain data that has been accessed and prefetch sequential blocks of data that are stored near the requested data so that future I/O requests can be serviced faster.

However some applications and their read ahead operations are not always sequential at the block layer the storage array implements. Additionally fairness policies of the storage scheduler will typically convert large sequential read access patterns into smaller sequential read patterns or turn into a stream of random read I/O while masking the identity of the virtual machine. Furthermore, the storage controller has to service many different connected systems with the shared cache, applying fairness policies to its consumers.

Some background processes impact the ability of read cache of servicing front-end processes such as applications. Background processes such as backup and virus scans read substantial portions of the virtual disks in a sequential manner. However backup and virus scan operations lack temporal data locality, they typically don’t reference the data again during the process.

Caches have a finite amount of capacity, resulting in eviction of data when more data exceeds the cache capacity. The data retrieved by backup jobs or anti-virus scans done at scale often exceed the cache capacity. This means that data currently stored in the cache is replaced by data requested by the backup job or the anti-virus, polluting the cache of the controllers with non-reusable data. If this data set surpasses the cache capacity, the read, write, evict cycle is repeated a couple of times, impacting the storage performance for all connected systems during this process. Any data required by the applications during these backup jobs or anti-virus scans has to be retrieved from the spindles, increasing latency. However no integration exists that controls cache operations on the storage array for the virtual machines running these backup or anti-virus operations.

Context-aware environment
New storage architecture systems move storage resources into the hypervisor. The hypervisor is rich with information, and has the ability to identify workloads while managing the resources with its resource schedulers. Turning the hypervisor into a control plane that manages both the resource and the demand. It provides a single construct to automate instructions in a single language with a model for granular levels of quality of service for applications.

A great example of VM-aware scheduling and automation is the I/O profiling functionality in FVP. I’m sure that VSAN and hyper-converged have similar approaches to solving this problem. As I work for PernixData I can go into more detail on how FVP solves cache pollution problems due to backup or anti-virus scans.

I/O profiling
When the applications are accelerated by FVP, virtual machine footprint on the acceleration resources could suffer from the same problem. When a read operation occurs, FVP determines if the data can be serviced by the host or by the storage array. When a cache miss occurs (2), data is retrieved from the storage array (3). To improve performance data is copied to the local acceleration resources while providing the data to the application (4). This process is called a false write.

False Write workflow

During a backup job, FVP replaces the hot application data by false writing data retrieved by the backup job, similar behavior occurs during anti-virus. The VM-aware nature of FVP allows the administrator to manage the I/O profile intelligently.

FVP 2.5 provides power-cli commands that allow an administrator to temporarily suspend read and/or write operations on a virtual machine. This allows FVP to retain the frequently accessed data for frontend process on the acceleration resource while sending writes directly to the storage array. When the PowerCLI command is issued, FVP ignores the data that is retrieved from the storage array due to a cache miss, avoiding overwriting hot data with data required by the backup job or anti-virus scan. When the backup operation is complete, a PowerCLI command is issued that resumes all data acceleration again. The article describing the PowerCLI commands will be available soon.

While the virtual machine false writes or true writes are suspended, FVP continues to service all incoming operations. For example, when Suspend- PrnxReadDataPopulation is active, all read I/Os would be satisfied by the acceleration resource when the data is present on the acceleration resource.

Almost all backup solutions provide the ability to configure a pre- and post command. By using these commands, you retain frequent access data for your applications, while reducing the unnecessary wear and tear introduces by backup data on the flash device. By leveraging the context rich environment and server side resources, you can create an architecture that can intelligently manage resources depending on application processes. It allows you to retain application performance at the same time while backend processes do their job to ensure continuity.

Other parts in this series:
Part 1: Intro
Part 2: Network Topology
Part 3: Data path is not managed as a clustered resource
Part 4: Storage controller architecture

Frank Denneman

Follow me on Twitter, visit the facebook fanpage of, or add me to your Linkedin network