With the announcement of FVP 2.0 a lot of buzz will be around distributed fault tolerant memory and the support of NFS. This all makes sense of course since for the first time in history compute memory becomes a part of storage system and you are now able to accelerate file based storage system. But one of the new features I’m really excited about is Fault Domains.
In FVP 2.0 you group hosts to reflect your datacenter fault domain topology within FVP and ensure redundant write data is stored in external fault domains. Let’s take a closer look at this technology and review the current version of fault tolerant write acceleration first.
FVP 1.5 Fault Tolerant Write Acceleration
When accelerating a datastore or virtual machine in FVP 1.x you can select 0, 1 or 2 replicas of redundant write data. When the option “local plus two network flash devices” is selected, FVP automatically selects two hosts in the cluster that have access to the datastore and have a network connection with the source host. If a failure occurs, such as the source host disconnects, crashes or the flash device fails, one of the hosts containing the replica write data will take over and send all uncommitted data from the acceleration layer to the storage system. For more detailed information, please read Fault Tolerant Write Acceleration
Let’s use the example of a four-host vSphere cluster. All four hosts have FVP installed and participate in a FVP cluster. VM A is configured with a write policy with a local copy and 1 network flash device. In this scenario FVP selected ESXi Host 1 as the peer host and Host 2 sends the redundant write data (RWD) to Host 1.
But what if Host 1 and Host 2 are part of the same fault domain topology? Fault domains are is set of components that share a single point of failure, such as a blade system or a server rack with a single power source. Many organizations treat blade systems and their enclosures as a fault domain. If the backplane of the blade system fails, all servers can be disconnected from the network unable to send data to the storage array or to other connected systems.
In this scenario both copies of uncommitted data cannot be written to the storage array if the network connection goes down. Or even worse what if the whole blade system goes down and RAM is used as an acceleration resource.
FVP 2.0 User Defined Fault domains
Fault Domains allow you to reflect your datacenter topology within FVP. This topology can be used to control where data gets replicated to when running in Write Back mode.
All hosts in the vSphere cluster are initially placed in the default fault domain. The default fault domain cannot be renamed, removed or given explicit associations. Newly added hosts will automatically be placed into this default fault domain. A host can be a member of only one fault domain, resulting in the behavior that all FVP Clusters in the vSphere Cluster share the fault domains.
After following the steps mentioned in the article: Configuring PernixData FVP 2.0 Fault Domains, two additional fault domains exits: Blade Center 1 and Blade Center 2.
When configuration acceleration of a datastore or virtual machines, you are now able to control where the data is replicated to when using write back acceleration. You do not have to select the specific host or the specific fault domain, just provide the number of replica’s and whether it should be placed in the same fault domain or in an external fault domain. FVP load balances the workload across the different vSphere hosts in the cluster. Ensuring distribution of network traffic and acceleration resource consumption while still safeguarding compliancy with fault domain policies.
Be aware that FVP only selects fault domains that belong to the same FVP and vSphere cluster. FVP will not select any fault domains that belong to a different vSphere cluster. By default FVP Write Back write policy selects 1 peer host in the same fault domain, but this can be easily adjusted to any other configuration. Just selects the required number of replica copies in the appropriate fault domain. Please note that the maximum number of peer host can never exceed two. For example, if two peer hosts in different fault domains are selected, no peer hosts can be selected in the same fault domain.
For the extreme risk adverse designs, if more than two fault domains are configured, FVP will distribute the replicas across two fault domains. Thus having the data in three different fault domains (local+fault domain1+ fault domain 2)
In the scenario when the source host fails, the peer host in the designated fault domain will write the uncommitted data to the storage system. In case a networking connection failure or a peer host failure of any kind; the PernixData Management Server will select a new peer host within the fault domain. This is all done transparent and no user interaction is required.
Topology alignment with fault domains
Fault domains build upon the strong fault tolerant feature present in FVP 1.x and are an excellent way to make your environment more resilient against component, network or host failure. By aligning FVP fault domains to your datacenter topology you can leverage the deterministic placement of redundant write data to either improve resiliency or take advantage of the availability of internal network bandwidth in blade systems.