VCDX- You cannot abstract your way out of things indefinitely

The amount of abstraction in IT is amazing. Every level in the software and hardware stack attempts to abstract operations and details. And the industry is craving for more. Look at the impact “All Things Software Defined” has on todays datacenter. It touches almost every aspect, from design to operations. The user provides the bare minimum of inputs and the underlying structure automagically tunes itself to a working solution. Brilliant! However sometimes I get the feeling that this level of abstraction becomes an excuse to not understand the underlying technology. As an architect you need to do your due diligence. You need to understand the wheels and cogs that are turning when dialing a specific knob at the abstracted layer.

But sometimes it seems that the abstraction level becomes the right to refuse to answer questions. This was always an interesting discussion during a VCDX defense session. When candidates argued that they weren’t aware of the details because other groups were responsible for that design. I tend to disagree

What level of abstraction is sufficient?
I am in the lucky position to work with PernixData R&D engineers and before that VMware R&D engineers. They tend to go deep, right down to the core of things. Discussing every little step of a process. Is this the necessary level of understanding the applied technology and solutions for an architect? I don’t think so. It’s interesting to know, but on a day-to-day basis you don’t have to understand the function of ceiling when DRS calculates priority levels of recommendations. What is interesting is to understand what happens if you place a virtual machine at the same hierarchical level as a resource pool filled with virtual machines. What is the impact on the service levels of these various entities?

Something in the middle might be the NFS series of Josh Odgers. Josh goes in-depth about the technology involved using NFS datastores. Virtual SCSI Hard Drives are presented to virtual machines, even when ESXi is connected to an NFS datastore. How does this impact the integrity of I/O’s? How does the SCSI protocol emulation process affect write ordering and of I/O’s of business critical applications. You as the virtual datacenter architect should be able to discuss the impact of using this technology with application owners. You should understand the potential impact a selected technology has on the various levels throughout the stack and what impact it has on the service it provides.

Recently I published a series on databases and what impact their workload characteristics have on storage architecture design. Understanding the position of a solution in the business process allows an architect to design a suitable solution. Lets use the OLTP example. Typically OLTP databases are at the front of the process, customer-facing process, dramatically put they are in the line of fire. When the OLTP database is performing slow or is unavailable it will typically impact revenue-generating processes. This means that latency is a priority but also concurrency and availability. You can then tailor your design to provide the best services to this application. This is just a simplified example, but it shows that you have to understand multiple aspects of the technology. Not just the behavior of a single component. The idea is to get a holistic view and then design your environment to cater the needs of the business, cause that’s why we get hired.

Circling back to the abstraction and the power of software defined, I though the post from Bart Heungens was interesting. Bart argues that Software Defined Storage is not the panacea for all storage related challenges. Which is true. Bart illustrates an architecture that is comprised of heterogeneous components. In his example, he illustrates what happens when you combine two servers HP DL380, but from different generations. Different generations primarily noticeable from a storage controller perspective and especially the way software behave. This is interesting on so many levels, and it would be a very interesting discussion if this were a VCDX defense session.

SDS abstracts many things, but it still relies on the underlying structure to provide the services. From a VCDX defense perspective, Bart has a constraint. And that is the already available hardware and the requirement to use these different generation hardware in his design. VCDX is not about providing the ideal design, but showing how you deal with constrains, requirements and demonstrating your expertise on technology how it impacts the requested solution. He didn’t solve the problem entirely, but by digging in deeper he managed to squeeze out performance to provide a better architecture to service the customers applications. He states the following:

Conclusion: the design and the components of the solution is very important to make this SDS based SAN a success. I hear often companies and people telling that hardware is more and more commodity and so not important in the Software Defined Datacenter, well I am not convinced at all.
I like the idea of VMware that states that, to enable VSAN, you need and SAS and SSD storage (HCL is quite restricted), just to be sure that they can guarantee performance. The HP VSA however is much more open and has lower requirements, however do not start complaining that your SAN is slow. Because you should understand this is not the fault of the VSA but from your hardware.

So be cognizant about the fact that while you are not responsible for every decision being made when creating an architecture for a virtual datacenter, you should be able to understand the impact various components, software settings and business requirements have on your part of the design. We are moving faster and faster towards abstracting everything. However this abstraction process does not exonerate you from understanding the potential impact it has on your area of responsibility

Database workload characteristics and their impact on storage architecture design – part 4 – NoSQL platforms

Welcome to part 4 of the Database workload characteristics series. Databases are considered to be one of the biggest I/O consumers in the virtual infrastructure. Database operations and database design are a study upon themselves, but I thought it might be interested to take a small peak underneath the surface of database design land. I turned to our resident Database expert Bala Narasimhan, PernixData’s director of products to provide some insights about the database designs and their I/O preferences.

Previous instalments of the series:

  1. Part 1 – Database Structures
  2. Part 2 – Data pipelines
  3. Part 3 – Ancillary structures for tuning databases
Question 4: What are some recent trends in the data management industry that we need to be aware of?

A lot of innovations have occurred in the data management industry over the last few years. In previous discussions we have touched on the relational model, schema design and ACID compliance. These are the foundations on which relational databases have been built. In today’s discussion we’ll focus on the recent innovations related to NoSQL platforms.

Over the years users have become aware that there are a class of applications for which the above requirements for relational databases[ACID compliance, relational model, schema design] can be too onerous. I won’t go into the details about these applications or why they find these requirements onerous in this discussion. [I am happy to do it if there is demand for it. ☺]

In response to these findings the industry began to develop new data management platforms collectively called NoSQL platforms. Some examples of NoSQL platforms include Hadoop and MongoDB. Among other things, NoSQL platforms are characterized by horizontal scaling, scale out architectures and support for programming paradigms other than SQL. Many NoSQL platforms are also designed to be eventually consistent thereby compromising consistency in favor of availability and partition tolerance. [See CAP Theorem for a detailed discussion on this topic.] I will now discuss two NoSQL platforms in some detail. These are Hadoop and MongoDB.

Hadoop can be viewed as a software framework that comprises two things:

  • MapReduce: A programming paradigm for large-scale data processing
  • HDFS: A distributed file system for storing data across a cluster of servers

Hadoop runs MapReduce jobs on a cluster of machines and is designed for batch processing as opposed to interactive operations. A MapReduce job, in its simplest form, comprises a Map phase followed by a Reduce phase. Think of the Map phase as a program that maps together all data that has the same key. Similarly, think of the Reduce phase as a program that takes the output of the Map phase and reduces it to a single value. A prototypical example of a MapReduce job is the ‘Word Count’ program. [Learn more about how MapReduce does WordCount here.]

Because of the fact that Hadoop does large sequential reads and writes, it is best suited for batch processing. Hadoop is therefore a throughput bound system. Examples of batch processing workloads that Hadoop is suited for include data preparation jobs and long running reports.

MongoDB is a popular NoSQL database built for modern day application development. MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Documents that share a similar structure are organized as collections. Each document has one or more fields in it. Collections in MongoDB are analogous to tables in a relational database while documents are analogous to rows. The fields within a document are analogous to columns within a relational database. One of key differentiator of MongoDB, in contrast to relational databases, is its robust support for flexible schemas.

MongoDB is very good for key-value queries, range queries and aggregations. In turn this means that most of the I/O generated by MongoDB is random in nature and is a good mix of reads and writes. MongoDB is therefore latency sensitive and low latencies can make a huge benefit from a performance perspective.

How can FVP help?
NoSQL platforms were developed because a class of applications were either very onerous to run on or got very poor performance when run against a traditional database. As discussed above some NoSQL platforms are throughput bound while others are latency sensitive. FVP can play a very beneficial role in accelerating NoSQL platforms because:

  1. FVP accelerates both reads and writes. This means that irrespective of the workload being run on the platform, it can leverage server side resources for speedup.
  2. Because of DFTM these platforms can now leverage server side RAM as an acceleration medium thereby providing a huge jump in performance.
  3. Large intermediate results or resource intensive operations such as sorts can now benefit from read and write acceleration via FVP.

You can now finally virtualize these platforms without worrying about performance. Instead of introducing rigid and silo’ed architectures, its makes sense to use PernixData FVP. FVP, Supported by VMware, allows for clustered and fault tolerant I/O acceleration by using flash or memory. By virtualizing these acceleration resources into one seamless pool, VMs can seamlessly migrate to any host in the cluster while being able to retrieve data throughout the cluster.

Using VM Storage Policies with PernixData FVP Datastore Write Policies

Profile-Driven Storage, introduced in vSphere 5.0, provides rapid and intelligent placement of virtual machines based on pre-defined storage profiles. vSphere 5.5 enhanced the previous version of Storage Profiles and along the way renamed it to VM Storage Policy.

The new architecture looks slightly different than the old one. VM Storage Policy kingpin is the rule-set. The rule-set is a group of rules that describe the storage characteristics of a datastore. These characteristics are provided by either the vendor-specific capabilities (through a VASA provider) or by a user-defined tag. A rule-set can contain multiple rules and a combination of vendor-specific capabilities rules and tag-based rules. This article focuses on using VM storage Policies with rules based on tags. The schematic overview provides insight on the relationship between objects:

01-vSphere VM Storage Policy

Configuring VM Storage Policy
The vSphere UI does not provide a single point of configuration of a VM Storage Policy. The VM Storage Policy UI allows you to define a rule-set and adding tags to the rule-set. However tags should be defined and associated with vSphere objects before creating the rule-set. Creating and assigning tags cannot be done from the VM Storage Policy UI.

Creating Tags
Tags creation can be done in different ways. Storage related tags can best be created either in the tag menu option in the home menu, or at the tag menu option of the manage tab of the datastore cluster or datastore itself. A tag must be assigned to a category. Categories defined the cardinality of the tag and which vSphere objects the tag is associated with.

Please note that you can edit the category and add associable objects at any time, however once set you cannot remove an object. If required, the category needs to be deleted and then created with the correct required objects.

Cardinality allows you to define whether the object accepts one tag or multiple tags from that category. In this scenario I will use the tag to define which FVP storage profile is assigned to the datastore. When accelerating a datastore in FVP, you can assign a default storage policy. All virtual machines, in the vSphere cluster, configured to use that particular datastore will be accelerated accordingly. As datastore write policies are mutually exclusive, they cannot exist on the same datastore at the same time.

02-Overview of Tags

With that in mind, the cardinality of “One tag per object”, aligns perfect to the exclusivity of the FVP write policy. Once an administrator assigns the FVP Write Back mode tag to a datastore, vSphere will not allow the administrator to also assign the FVP write through mode to the datastore.

Assigning a tag is a bit tedious. None of the workflows provide the option to assign the tag to multiple datastores.

Please note that when assigning a tag to a datastore cluster, the tag itself is not waterfall down and is assigned to the members of the datastore. This has to be done manually.

I prefer to create the tags and go to the datastore option in vCenter view. Multi-select (Shift-select) the appropriate datastores, open up submenu (right-click) the selection, and choose Assign Tag…

03-Multi-select datastores

Once the tags have been created and assigned to the vSphere storage objects (Datastore Cluster and Datastore) a VM storage Policy can be created. Go to the Home view and select the VM Storage Policy. After providing an name and description the rule-set is created. A rule-set can contain multiple rules, however you have to add them by category. After selecting the tags, the workflow will show a list of compatible datastores.

In this scenario I have four datastores, Cryo-SYN1,2,3,4, all four are replicated. Depending on the service level agreement I accelerated two datastores with Write-Through mode and two datastores with Write-Back mode. I created two VM Storage policies and assigned the following tags to their rule-set.

04-Overview VM storage Policies

Once the VM Storage Policies were created, VM Storage Policies were enabled onthe compute cluster. The schematic overview provides insight on the relationship between all objects:


If the customer wants to deploy a virtual machine that has an RPO of 15 minutes defined it its service level agreement, the administrator selects the VM Storage Policy of RPO-15-R-DS-FVP-WB. vSphere provisioning process displays the compatible datastores save to provision the virtual machine with an RPO requirement of 15 minutes.


Before configuring the VM Storage Policies, I accelerated the datastores within FVP.

With the help of VM Storage Policies and FVP write polices at datastore level, the virtual machine is placed on a replicated datastore with Write-Back enabled. The VM summary page confirms:
08-VM provisioned

MS Word style formatting shortcut keys for Mac

Recently I started to spend a lot of time in MS word again, and as a stickler for details I dislike a mishmash of font types throughout my document. I spend a lot of time on configuring the styles of the document, yet when I paste something from other documents, MS word tend to ignore these. Correcting the format burns a lot of time and it simply annoys the crap out of me.

To avoid this further, I started to dig around to find some font and style related shortcut keys. Yesterday I tweeted the shortcut key to apply the normal style and by the looks of retweets many of you are facing the same challenge.

Below is a short list of shortcut keys that I use. There are many more, share the common ones you like to use. As I use Mac I listed the Mac shortcut combination. Replace CTRL for CMD if you are using MS Word on a windows machine.

Select text:
Select all: CTRL+A
Select sentence: CMD + click
Select word: Double click
Select paragraph: Triple click

Clear formatting: CTRL+spacebar
Apply Normal Style: Shift+CMD+N
Header 1: CMD+ALT+1
Header 2: CMD+ALT+2
Header 3: CMD+ALT+3
Change Case: CMD+Option+C (repeat combination to cycle through options)
Indent paragraph: CTRL+Shift+M
Remove indent: CMD+Shift+M

Find and replace: F5

Future direction of disabling TPS by default and its impact on capacity planning

Eric Sloofs tweet alerted me to the following announcement of TPS being disabled by default in the upcoming vSphere release

In short TPS will no longer be enabled by default due to security concerns starting with the following releases:

ESXi 5.5 Update release – Q1 2015
ESXi 5.1 Update release – Q4 2014
ESXi 5.0 Update release – Q1 2015
The next major version of ESXi

More information here: Security considerations and disallowing inter-Virtual Machine Transparent Page Sharing (2080735)

After reading this announcement I hope architects review the commonly (mis) used over-commitment ratios during capacity planning exercises. It was always one of favorites topics to discuss at VCDX defense sessions.

It’s common to see a 20 to 30% over-commitment ratio in a vSphere design attributed to TPS. But in reality these ratios are never seen due to IT organization monitoring processes. Why? Because TPS is not used in the same frequency as in the older pre-vSphere infrastructures (ESX 2.x and 3.x) anymore. In reality vSphere have disintegrated the out-of-the-box over-commitment ratios. It only leverages TPS when certain memory usage thresholds are exceeded. Typically architects do not design their environments to reach the memory usage thresholds at 96%.

Large pages and processor architectures
When AMD and Intel introduced hardware-assisted memory virtualization features (RVI and EPT) VMware engineers quickly discovered that lead to increased virtual machine performance while reducing the memory consumption of the kernel. However there was some overhead involved but this could be solved by using large pages. A normal memory page is 4KB a large page is 2MB in size.

However large pages could not be combined with TPS as of the overhead introduced by scanning these 2MB block regions. The probability of finding identical large pages made them realize that the overhead was not worth the low potential of memory saving. The performance increase was calculated around 30% while the impact of sharing loss was perceived minimal, as memory footprints in physical machines tend to increase every year. Therefore virtual machines provisioned on vSphere are using a hardware-MMU leveraging the CPU hardware assisted memory virtualization features.

Although vSphere uses large pages, TPS still is active. It scans and hashes all pages inside a large page to decrease memory usage pressure when a memory threshold is reached. During my time at VMware I wrote an article on the VMkernel memory thresholds in vSphere 5.x Another interesting thing about large pages is the tendency to provide the best performance. The kernel will split up Large pages and share pages during memory pressure, but when no memory pressure is present new incoming pages will be stored in large pages. Potentially creating a cyclical process of constructing and deconstructing large pages.

Another impact on the memory sharing potential is the NUMA processor architecture. NUMA allows the best memory performance by storing memory pages as close to a CPU as possible. TPS memory sharing could reduce the performance while pages are shared between two separate CPU systems. For more info about NUMA and TPS please read the article: “Sizing VMS and NUMA nodes

Capacity planning impact
Therefor the impact of disabling TPS by default will not be as big some might expect. What I do find interesting is the attention of security. I absolutely agree that security out of the box is crucial, but when regarding probability I would rather do a man-in-the-middle attack of the vMotion network, reading clear text memory across the network then wait for TPS to collapse memory. Which leads me to wonder when to expect encryption for vMotion traffic.