What’s new in PernixData FVP 2.0 – User Defined Fault Domains

With the announcement of FVP 2.0 a lot of buzz will be around distributed fault tolerant memory and the support of NFS. This all makes sense of course since for the first time in history compute memory becomes a part of storage system and you are now able to accelerate file based storage system. But one of the new features I’m really excited about is Fault Domains.

In FVP 2.0 you group hosts to reflect your datacenter fault domain topology within FVP and ensure redundant write data is stored in external fault domains. Let’s take a closer look at this technology and review the current version of fault tolerant write acceleration first.

FVP 1.5 Fault Tolerant Write Acceleration

When accelerating a datastore or virtual machine in FVP 1.x you can select 0, 1 or 2 replicas of redundant write data. When the option “local plus two network flash devices” is selected, FVP automatically selects two hosts in the cluster that have access to the datastore and have a network connection with the source host. If a failure occurs, such as the source host disconnects, crashes or the flash device fails, one of the hosts containing the replica write data will take over and send all uncommitted data from the acceleration layer to the storage system. For more detailed information, please read Fault Tolerant Write Acceleration

Let’s use the example of a four-host vSphere cluster. All four hosts have FVP installed and participate in a FVP cluster. VM A is configured with a write policy with a local copy and 1 network flash device. In this scenario FVP selected ESXi Host 1 as the peer host and Host 2 sends the redundant write data (RWD) to Host 1.

Figurexxx-four-host-cluster

But what if Host 1 and Host 2 are part of the same fault domain topology? Fault domains are is set of components that share a single point of failure, such as a blade system or a server rack with a single power source. Many organizations treat blade systems and their enclosures as a fault domain. If the backplane of the blade system fails, all servers can be disconnected from the network unable to send data to the storage array or to other connected systems.

blade system2

In this scenario both copies of uncommitted data cannot be written to the storage array if the network connection goes down. Or even worse what if the whole blade system goes down and RAM is used as an acceleration resource.

FVP 2.0 User Defined Fault domains

Fault Domains allow you to reflect your datacenter topology within FVP. This topology can be used to control where data gets replicated to when running in Write Back mode.

Default Behavior
All hosts in the vSphere cluster are initially placed in the default fault domain. The default fault domain cannot be renamed, removed or given explicit associations. Newly added hosts will automatically be placed into this default fault domain. A host can be a member of only one fault domain, resulting in the behavior that all FVP Clusters in the vSphere Cluster share the fault domains.

03-Fault-domain-hover-over

After following the steps mentioned in the article: Configuring PernixData FVP 2.0 Fault Domains, two additional fault domains exits: Blade Center 1 and Blade Center 2.

08-Fault Domains overview

Replica placement
When configuration acceleration of a datastore or virtual machines, you are now able to control where the data is replicated to when using write back acceleration. You do not have to select the specific host or the specific fault domain, just provide the number of replica’s and whether it should be placed in the same fault domain or in an external fault domain. FVP load balances the workload across the different vSphere hosts in the cluster. Ensuring distribution of network traffic and acceleration resource consumption while still safeguarding compliancy with fault domain policies.

07-Add VMs - Commit Writes to

Be aware that FVP only selects fault domains that belong to the same FVP and vSphere cluster. FVP will not select any fault domains that belong to a different vSphere cluster. By default FVP Write Back write policy selects 1 peer host in the same fault domain, but this can be easily adjusted to any other configuration. Just selects the required number of replica copies in the appropriate fault domain. Please note that the maximum number of peer host can never exceed two. For example, if two peer hosts in different fault domains are selected, no peer hosts can be selected in the same fault domain.

For the extreme risk adverse designs, if more than two fault domains are configured, FVP will distribute the replicas across two fault domains. Thus having the data in three different fault domains (local+fault domain1+ fault domain 2)

Figurexxx-triple-FD-design

Error correction

In the scenario when the source host fails, the peer host in the designated fault domain will write the uncommitted data to the storage system. In case a networking connection failure or a peer host failure of any kind; the PernixData Management Server will select a new peer host within the fault domain. This is all done transparent and no user interaction is required.

Figurexxx-dynamic-peer-host-selection

Topology alignment with fault domains

Fault domains build upon the strong fault tolerant feature present in FVP 1.x and are an excellent way to make your environment more resilient against component, network or host failure. By aligning FVP fault domains to your datacenter topology you can leverage the deterministic placement of redundant write data to either improve resiliency or take advantage of the availability of internal network bandwidth in blade systems.

Configuring PernixData FVP 2.0 Fault Domains

This article covers the configuration of PernixData FVP 2.0 Fault Domains using the scenario of a four host vSphere cluster stretched across two blade centers:

blade system2

Default Behavior
All hosts in the vSphere cluster are initially placed in the default fault domain. The default fault domain cannot be renamed, removed or given explicit associations. Newly added hosts will automatically be placed into this default fault domain. A host can be a member of only one fault domain, resulting in the behavior that all FVP Clusters in the vSphere Cluster share the fault domains.

03-Fault-domain-hover-over

Let’s create two additional fault domains to reflect the blade system topology. I’m using the web client, when using the vSphere client navigate to the PernixData tab in your vSphere cluster.

1. In the vCenter Inventory, navigate to the PernixData FVP inventory item and select FVP Clusters
2. Click on a FVP cluster in the designated vSphere Cluster
3. Go to Manage, select Fault Domains, click on “add Fault Domains”

04-Add-Fault-Domain-Blade-center-1

4. Provide a name and click OK when finished
5. Click on the option “Add Host…” and select the hosts, click OK when finished.

05-Add hosts - Blade Center 1

6. Repeat step 3 to 5 to create additional Fault Domains, please note that the user interface displays the current Fault Domain, allowing you to easily determine which host should be moved to their own fault domain.

06-Current Fault Domain

The overview of Fault domains now shows 3 fault domains, the Default Fault Domain and Fault Domains Blade Center 1 and 2.

08-Fault Domains overview

During the configuration of datastore or VM acceleration, you are now able to control where the data is replicated to when using write back acceleration.

07-Add VMs - Commit Writes to

FVP Write Back write policy defaults to 1 peer host in the same fault domain, but this can be easily adjusted to any other configuration. Just selects the required number of replica copies in the appropriate fault domain. Please note that the maximum number of peer host can never exceed two. For example, if two peer hosts in different fault domains are selected, no peer hosts can be selected in the same fault domain. In this scenario, the VM is configured with a selection of one host peer in the same fault domain and one peer host in a different fault domain.

Figurexxx-1 local and remote peer host

For the extreme risk adverse designs, if more than two fault domains are configured, FVP will distribute the replicas across two fault domains. Thus having the data in three different fault domains (local+fault domain1+ fault domain 2)

Figurexxx-triple-FD-design

For in-depth information about PernixData FVP 2.0 please continue to read the article: “What’s new in PernixData FVP 2.0 – User Defined Fault Domains

PernixData FVP 2.0 released

It gives me great pleasure to announce that PernixData released FVP 2.0 today. Building upon the industry leading acceleration platform FVP 2.0 contains the following new features:

Distributed Fault Tolerant Memory: FVP fault tolerance makes volatile server memory part of storage for the first time ever.

NFS support: With FVP 2.0 you can now accelerate application workload while any type of storage system provide storage capacity, whether its block based or file based (NFS).

Adaptive Network compression: FVP provides its own lightweight network protocol to send redundant write data between the source and peer hosts in the FVP cluster. In 2.0 adaptive network compression analyzes in real time the write data and if the benefit exceeds the cost, data is compressed to reduce latency and consumed bandwidth.

User defined fault domains: Fault Domains allow you to reflect your datacenter topology within FVP. This topology can be used to control where data gets replicated to when running in Write Back mode.

For the official GA release notes please follow this link.

Starting today, I will cover the new features in depth in the what’s new in PernixData FVP 2.0 series.

Part 1: User Defined Fault Domains
Part 2: Distributed Fault Tolerant Memory
Part 3: Adaptive network compression

Database workload characteristics and their impact on storage architecture design – part 3 – Ancillary structures for tuning databases

Welcome to part 3 of the Database workload characteristics series. Databases are considered to be one of the biggest I/O consumers in the virtual infrastructure. Database operations and database design are a study upon themselves, but I thought it might be interested to take a small peak underneath the surface of database design land. I turned to our resident Database expert Bala Narasimhan, PernixData’s director of products to provide some insights about the database designs and their I/O preferences.

Previous instalments of the series:

  1. Part 1 – Database Structures
  2. Part 2 – Data pipelines
Question 3: You’ve talked about ancillary structures for tuning databases, what are they and what role does FVP play here?

It goes without saying that database performance, whether OLTP or data warehousing, is critical for the business. As a result, DBA use ancillary structures to enhance database performance. Examples of such ancillary structures include indexes and Materialized Views (MV). MV are called Indexed Views on SQL Server.

Index
An index is an ancillary structure that allows a table to be sorted in multiple ways. This helps with quick lookups and operations, such as Merge Joins, that require that the data be sorted. Imagine a table with many columns in it. This table can be sorted in only one way on disk. For example, consider the Customer table shown below

CREATE TABLE Customer (
CustID int,
Name Char(20),
Zipcode int,
PRIMARY KEY (CustID));

The customer ID column, CustID, is the primary key in this table. This means that all customers can be uniquely identified by their CustID value. The table will most probably be sorted on this column.

Now imagine you ran a query that wanted to find out the number of customers in ZIP code 95051. Since the table is not sorted on ZIP code you will need to scan every single row in the table, see whether its ZIP code value is 95051 and add up the number of such rows. This can be extremely expensive. Instead what you can do is build an index on the ZIP code column. This index will be sorted on ZIP code and you can do a potentially faster lookup because of this.

Materialized View
A Materialized View (MV) is different from an index because an MV is a database object that contains the results of a query. If you know that a query will be run repeatedly then you can simply cache the results of that query in an MV and return the results as opposed to running the query itself each time. Example syntax to create an MV is as follows:

CREATE MATERIALIZED VIEW FOO AS SELECT * FROM BAZ WHERE BAZ.id = ‘11’;

In the SQL statement above the materialized view FOO stores the results of the query ‘SELECT * FROM BAZ WHERE BAZ.id = 11’. So, when someone executes the query ‘SELECT * FROM BAZ WHERE BAZ.id = 11’ you can simply return the rows in FOO instead because the results of the query are already saved in FOO. Now, this example is very simplistic but you can imagine that the query can be arbitrarily complex and storing its results in an MV can therefore be hugely beneficial.

Based on this explanation, one thing is apparent about both indexes and MV. Both indexes and MV are not ephemeral structures. This means that both of them need to be persisted on disk just like the tables in a database are. This means they consume disk space but more importantly it means that accessing them requires one to potentially do a lot of I/O. Good storage performance is therefore key to making these ancillary structures do their job.

These ancillary structures also come with a number of limitations. Firstly, they consume a lot of disk space. Sometimes they consume as much space as the underlying table and so it becomes more of an overhead than a benefit. Secondly, especially in the case of the MV, refresh rates make a huge difference. What do I mean by this?

Consider my example MV above. Let’s say that everyday I load new rows into the table BAZ and some of these rows have the value ‘11’ for the column id. In other words, there are new rows being added to BAZ every day where BAZ.id = 11. Once these new rows are added the MV foo has become stale because it is no longer storing the correct rows anymore. So, each time a new row is inserted into BAZ where BAZ.id = 11 not only must we do a write into the base table BAZ but we must also refresh the MV foo that sits on it. One I/O therefore ends up becoming multiple I/O! And, now if someone tries to query the MV foo when it is being refreshed you have all sorts of storage performance problems.

Note that both of these ancillary structures are great if you know what queries you are going to run. If so, you can both create the required indexes and MV. If, however, you run a query that cannot leverage these structures you get severe performance problems. And the truth of the matter is that it is seldom that case that you know all the queries you will run up front. So, ancillary structures can take you only so far sometimes.

How can FVP help?
  • When using server-side resources such as flash and RAM not only will writes to the underlying base table go faster in Write Back mode, but, refreshes on the MV sitting on top of those base tables will go much faster too. This means better query performance, higher concurrency and better scalability.
  • FVP will allow you to run ad-hoc queries at high speed. Even if you cannot leverage the existing indexes or MV for your query, accesses to the underlying base tables will be much faster owing to the fact that FVP will use server side flash and RAM for those base tables.
  • The above point means you do not need to create as many indexes or MV as you used to. This results in both tremendous savings from a storage capacity perspective and from an operational perspective of managing and running the database.

Part 4 coming soon!

Database workload characteristics and their impact on storage architecture design – part 2 – Data pipelines

Welcome to part 2 of the Database workload characteristics series. Databases are considered to be one of the biggest I/O consumers in the virtual infrastructure. Database operations and database design are a study upon themselves, but I thought it might be interested to take a small peak underneath the surface of database design land. I turned to our resident Database expert Bala Narasimhan, PernixData’s director of products to provide some insights about the database designs and their I/O preferences.

Question 2: You mentioned data pipelines in your previous podcast, what do you mean by this?

What I meant by data pipeline is the process by which this data flows in the enterprise. Data is not a static entity in the enterprise; it flows through the enterprise continuously and at various points is used for different things. As mentioned in part 1 of this series, data usually enters the pipeline via OLTP databases and this can be from numerous sources. For example, retailers may have Point of Sale (POS) databases that record all transactions (purchases, returns etc.). Similarly, manufacturers may have sensors that are continuously sending data about health of the machines to an OLTP database. It is very important that this data enter the system as fast as possible. In addition, these databases must be highly available, have support for high concurrency and have consistent performance. Low latency transactions are the name of the game in this part of the pipeline.

At some point, the business may be interested in analyzing this data to make better decisions. For example, a product manager at the retailer may want to analyze the Point of Sale data to better understand what products are selling at each store and why. In order to do this, he will need to run reports and analytics on the data. But as we discussed earlier, these reports and analytics are usually throughput bound and ad-hoc in nature. If we run these reports and analytics on the same OLTP database that is ingesting the low latency Point of Sale transactions then this will impact the performance of the OLTP database. Since OLTP databases are usually customer facing and interactive, a performance impact can have severe negative outcomes for the business.

As a result what enterprises usually do is Extract the data from the OLTP database, Transform the data into a new shape and Load it into another database, usually a data warehouse. This is known as the ETL process. In order to do the ETL, customers use a solution such as Informatica (ETL) (3) or Hadoop (4) between your OLTP database and data warehouse. Some times customers will simply suck in all the data of the OLTP database (Read intensive, larger block size, throughput sensitive query) and than do the ETL inside the data warehouse itself. Transforming the data into a different shape requires reading the data, modifying it, and writing the data into new tables. You’ve most probably heard of nightly loads that happen into the data warehouse. This process is what is being referring to!

As we discussed before, OLTP databases may have a normalized schema and the data warehouse may have a more denormalized schema such as a Star schema. As a result, you can’t simply do a nightly load of the data directly from the OLTP database into the data warehouse as is. Instead you have to Extract the data from the OLTP database, Transform it from a normalized schema to a Star schema and then Load it into the data warehouse. This is the data pipeline. Here is an image that explains this:

ETL

In addition, there can also be continuous small feeds of data into the data warehouse by trickle loading small subsets of data, such as most recent or freshest data. By using the freshest data in your data warehouse you make sure that the reports you run or the analytics you do is not stale and is up to date and therefore enables the most accurate decisions.

As mentioned earlier, the ETL process and the data warehouse are typically throughput bound. Server side flash and RAM can play a huge role here because the ETL process and the data warehouse can now leverage the throughput capabilities of these server side resources.

Using PernixData FVP

Some specific, key benefits of using FVP with the data pipeline include:

  • OLTP databases can leverage the low latency characteristics of server side flash & RAM. This means more transactions per second and higher levels of concurrency all while providing protection against data loss via FVP’s write back replication capabilities.
  • Trickle loads of data into the data warehouse will get tremendously faster in Write Back mode because the new rows will be added to the table as soon as it touches the server side flash or RAM.
  • The reports and analytics may execute joins, aggregations, sorts etc. These require rapid access to large volumes of data and can also generate large intermediate results. High read and write throughput are therefore beneficial and having this done on the server right next to the database will help performance tremendously. Again, Write Back is a huge win.
  • Analytics can be ad-hoc and any tuning that the DBA have done may not help. Having the base tables on flash via FVP can help performance tremendously for ad-hoc queries.
  • Analytics workloads tend to create and leverage temporary tables within the database. Using server side resources for read enhances performance on these temporary tables and write accesses to them.
  • In addition, there is also a huge operational benefit. We can now virtualize the entire data pipeline (OLTP databases, ETL, data warehouse, data marts etc.) because we are able to provide high performance and consistent performance via server side resources and FVP. This brings together the best of both workloads. Leverage the operational benefits of a virtualization platform, such as vSphere HA, DRS and vMotion, and standardize the entire data pipeline on it without sacrificing performance at all.

Other parts of this series:

  1. Part 1 – Database Structures
  2. Part 3 – Ancillary structures for tuning databases