Playing tonight: DRS and the IO controllers

Ever wondered why the band is always mentioned second, is the band replaceable? Is the sound of the instruments so ambiguous that you can swap out any musician with another? Apparently the front man is the headliner of the show and if he does he job well he will never be forgotten. The people who truly recognize talent are the ones that care about the musicians. They understand that the artist backing the singer create the true sound of the song. And I think this is also the case when it comes to DRS and his supporting act the Storage controllers. Namely SIOC and NETIOC. If you do it right, the combination creates the music in your virtual datacenter, well at least from a resource management perspective. ;)

Last week Chris Wahl started a discussion about DRS and its inability to not load-balance perfectly the VMs amongst host. Chris knows about the fact that DRS is not a VM distribution mechanism, his argument is more focused on the distribution of load on the backend; the north-south and east-west uplinks. And for this I would recommend SIOC and NETIOC. Let’s do a 10.000 foot flyby over the different mechanisms.

Distributed Resource Scheduler (DRS)
DRS distributes the virtual machines – the consumers – across the ESXi hosts, the producers. Whenever the virtual machine wants to consume more resources, DRS attempts to provide these resources to this virtual machine. It can do this by moving other virtual machines to different hosts, or move the virtual machine to another host. Trying to create an environment where the consumers can consume as much as possible. As workload patterns differ from time to time, from day to day, an equal number of VMs per host does not provide a balanced resource offering. It’s best to create a combination of idle and active virtual machines per host. And now think about the size of virtual machines, most environments do not have a virtual machine configuration landscape to utilizes a identical hardware configuration. And if that was the case, think about the applications, Some are memory bound, some applications are CPU bound. And to make it worse, think load correlation and load synchronicity. Load correlation defines the relationship between loads running in different machines. If an event initiates multiple loads, for example, a search query on front-end webserver resulting in commands in the supporting stack and backend. Load synchronicity is often caused by load correlation but can also exist due to user activity. It’s very common to see spikes in workload at specific hours, for example think about log-on activity in the morning. And for every action, there is an equal and opposite re-action, quite often load correlation and load synchronicity will introduce periods of collective non-or low utilization, which reduce the displayed resource utilization. All these things, all this coordination is done by DRS, fixating an identical number of VMs per host is in my opinion lobotomizing DRS.

But DRS is only focused on CPU and Memory. Arguably you can treat network and storage somewhat CPU consumption as well, but lets not go that deep. Some applications are storage bound some applications are network bound. For this other components are available in your vSphere infrastructure. The forgotten heroes, SIOC and NETIOC.

Storage IO Control (SIOC)
Storage I/O Control (SIOC) provides a method to fairly distribute storage I/O resources during times of contention. SIOC provides a datastore-wide scheduling using virtual disk shares to calculate priority. In a healthy and properly designed environment, every host that is part of the cluster should have a connection to the datastore and all host should have an equal amount of paths to the datastore. SIOC monitors the consumption and if the latency experienced by the virtual machine exceeds the user-defined threshold, SIOC distributes priority amongst the virtual machines hitting that datastore. By default every virtual machine receives the same priority per VMDK per datastore, but this can be modified if the application requires this from a service level perspective.

Network I/O Control (NETIOC)
The east-west equivalent of its north-south brother SIOC. NETIOC provides control for predictable networking performance while different network traffic streams are contending for the same bandwidth. Similar controls are offered, but are now done on traffic patterns instead of a per virtual machine basis. Similar architecture design hygiene applies here as well. All hosts across the cluster should have the same connection configuration and amount of bandwidth available to them. The article “A primer on Network I/O Control” provides more info on how NETIOC works, VMware published a NETIOC Best Practice white paper a while ago, but most of it is still accurate.

And the bass guitar player of the virtual datacenter, Storage DRS.
Storage DRS provides virtual machine disk placement and load balancing mechanisms based on both space and I/O capacity. Where SIOC reactively throttles hosts and virtual machines to ensure fairness, SDRS proactively generates recommendations to prevent imbalances from both space utilization and latency perspectives. More simply, Storage DRS does for storage what DRS does for compute resources.

These mechanism combined with a healthy – well architected – environment will help you distribute the consumers across the producers with the proper context in mind. Which virtual machines are hot and which are not? Much better than playing the numbers game! Now, one might argue but what about failure scenarios? If a have an equal number of VMs running on my host, my failover time decreases as well. Well it depends. HA distributes virtual machines across the cluster and if DRS is up and running, it moves virtual machines around if it cannot satisfy the resource entitlement of the virtual machines (VM level reservations). Duncan wrote about DRS and HA behavior a while ago, and of course we touched upon this in our book the 5.1 clustering deepdive. (still fully applicable for 5.5 environments)

In my opinion, trying to outsmart advanced and adaptive computer algorithms with basic math reasoning is really weird. Especially when most people are talking about Software defined datacenters and whether you are managing pets versus cattle. When your environment is healthy and layed-out in a homogenous way , you cannot beat computer algorithms. The thing you should focus on is the alignment of resource priority to business service levels. And that’s what you achieve by applying the correct share levels at DRS, SIOC and NETIOC levels. Maybe you can devops your way into leveraging various scripting languages. ;)

November speaking events

The upcoming two weeks I will be presenting at different locations. Today I’m flying out to New York and will be traveling the East Coast the upcoming days. Next week I will be traveling to the UK and Denmark to present and attend the big VMUG User conference events. If you want to catch me, here are the dates and location:

New York, USA
On 13 November I’m presenting “Building an Application Centric Storage Platform” at the New York City VMUG held at 250 West 41 Street in New York. The event starts at 12:00 EST. To register follow this link.

Boston, USA
On 14 November I’m presenting “Building an Application Centric Storage Platform” at the Axis event held at 45 School Street in Boston. The event starts at 11:30 EST. To register follow this link.

Birmingham, United Kingdom
Tuesday 18 November I will present “Re-Thinking Storage by Virtualizing Flash and RAM” at the UK National VMUG in the National Motorcycle Museum near Birmingham. The session starts 15:20 in the Bracebridge Suite.

The line up is overwhelming again and I haven’t decided which session to attend, VMware Horizon Architecture and Design by Barry Coombs and Peter von Oven looks interesting.

Copenhagen, Denmark
I’m presenting “Re-Thinking Storage by Virtualizing Flash and RAM” on the 20th of November at the Nordic VMUG at the Bella Center in Copenhagen. My session starts at 10:15 and is in Room 6.

Nordics VMUG agenda is quite brilliant again this year. I’m looking forward to attend the session of Duncan “What’s Coming for vSphere in Future Releases at 11:30, VSAN the first 6 Months by Cormac at 13:00 and Everything Virtual Volumes – VVOLS by Paudie O’Riordan at 15:00.

I will be present at the European VMUGs all day, if you like to have a conversation about PernixData FVP, please reach out to me on twitter -@frankdenneman-, and we’ll find a timeslot to meet.

VCDX- You cannot abstract your way out of things indefinitely

The amount of abstraction in IT is amazing. Every level in the software and hardware stack attempts to abstract operations and details. And the industry is craving for more. Look at the impact “All Things Software Defined” has on todays datacenter. It touches almost every aspect, from design to operations. The user provides the bare minimum of inputs and the underlying structure automagically tunes itself to a working solution. Brilliant! However sometimes I get the feeling that this level of abstraction becomes an excuse to not understand the underlying technology. As an architect you need to do your due diligence. You need to understand the wheels and cogs that are turning when dialing a specific knob at the abstracted layer.

But sometimes it seems that the abstraction level becomes the right to refuse to answer questions. This was always an interesting discussion during a VCDX defense session. When candidates argued that they weren’t aware of the details because other groups were responsible for that design. I tend to disagree

What level of abstraction is sufficient?
I am in the lucky position to work with PernixData R&D engineers and before that VMware R&D engineers. They tend to go deep, right down to the core of things. Discussing every little step of a process. Is this the necessary level of understanding the applied technology and solutions for an architect? I don’t think so. It’s interesting to know, but on a day-to-day basis you don’t have to understand the function of ceiling when DRS calculates priority levels of recommendations. What is interesting is to understand what happens if you place a virtual machine at the same hierarchical level as a resource pool filled with virtual machines. What is the impact on the service levels of these various entities?

Something in the middle might be the NFS series of Josh Odgers. Josh goes in-depth about the technology involved using NFS datastores. Virtual SCSI Hard Drives are presented to virtual machines, even when ESXi is connected to an NFS datastore. How does this impact the integrity of I/O’s? How does the SCSI protocol emulation process affect write ordering and of I/O’s of business critical applications. You as the virtual datacenter architect should be able to discuss the impact of using this technology with application owners. You should understand the potential impact a selected technology has on the various levels throughout the stack and what impact it has on the service it provides.

Recently I published a series on databases and what impact their workload characteristics have on storage architecture design. Understanding the position of a solution in the business process allows an architect to design a suitable solution. Lets use the OLTP example. Typically OLTP databases are at the front of the process, customer-facing process, dramatically put they are in the line of fire. When the OLTP database is performing slow or is unavailable it will typically impact revenue-generating processes. This means that latency is a priority but also concurrency and availability. You can then tailor your design to provide the best services to this application. This is just a simplified example, but it shows that you have to understand multiple aspects of the technology. Not just the behavior of a single component. The idea is to get a holistic view and then design your environment to cater the needs of the business, cause that’s why we get hired.

Circling back to the abstraction and the power of software defined, I though the post from Bart Heungens was interesting. Bart argues that Software Defined Storage is not the panacea for all storage related challenges. Which is true. Bart illustrates an architecture that is comprised of heterogeneous components. In his example, he illustrates what happens when you combine two servers HP DL380, but from different generations. Different generations primarily noticeable from a storage controller perspective and especially the way software behave. This is interesting on so many levels, and it would be a very interesting discussion if this were a VCDX defense session.

SDS abstracts many things, but it still relies on the underlying structure to provide the services. From a VCDX defense perspective, Bart has a constraint. And that is the already available hardware and the requirement to use these different generation hardware in his design. VCDX is not about providing the ideal design, but showing how you deal with constrains, requirements and demonstrating your expertise on technology how it impacts the requested solution. He didn’t solve the problem entirely, but by digging in deeper he managed to squeeze out performance to provide a better architecture to service the customers applications. He states the following:

Conclusion: the design and the components of the solution is very important to make this SDS based SAN a success. I hear often companies and people telling that hardware is more and more commodity and so not important in the Software Defined Datacenter, well I am not convinced at all.
I like the idea of VMware that states that, to enable VSAN, you need and SAS and SSD storage (HCL is quite restricted), just to be sure that they can guarantee performance. The HP VSA however is much more open and has lower requirements, however do not start complaining that your SAN is slow. Because you should understand this is not the fault of the VSA but from your hardware.

So be cognizant about the fact that while you are not responsible for every decision being made when creating an architecture for a virtual datacenter, you should be able to understand the impact various components, software settings and business requirements have on your part of the design. We are moving faster and faster towards abstracting everything. However this abstraction process does not exonerate you from understanding the potential impact it has on your area of responsibility

Database workload characteristics and their impact on storage architecture design – part 4 – NoSQL platforms

Welcome to part 4 of the Database workload characteristics series. Databases are considered to be one of the biggest I/O consumers in the virtual infrastructure. Database operations and database design are a study upon themselves, but I thought it might be interested to take a small peak underneath the surface of database design land. I turned to our resident Database expert Bala Narasimhan, PernixData’s director of products to provide some insights about the database designs and their I/O preferences.

Previous instalments of the series:

  1. Part 1 – Database Structures
  2. Part 2 – Data pipelines
  3. Part 3 – Ancillary structures for tuning databases
Question 4: What are some recent trends in the data management industry that we need to be aware of?

A lot of innovations have occurred in the data management industry over the last few years. In previous discussions we have touched on the relational model, schema design and ACID compliance. These are the foundations on which relational databases have been built. In today’s discussion we’ll focus on the recent innovations related to NoSQL platforms.

Over the years users have become aware that there are a class of applications for which the above requirements for relational databases[ACID compliance, relational model, schema design] can be too onerous. I won’t go into the details about these applications or why they find these requirements onerous in this discussion. [I am happy to do it if there is demand for it. ☺]

In response to these findings the industry began to develop new data management platforms collectively called NoSQL platforms. Some examples of NoSQL platforms include Hadoop and MongoDB. Among other things, NoSQL platforms are characterized by horizontal scaling, scale out architectures and support for programming paradigms other than SQL. Many NoSQL platforms are also designed to be eventually consistent thereby compromising consistency in favor of availability and partition tolerance. [See CAP Theorem for a detailed discussion on this topic.] I will now discuss two NoSQL platforms in some detail. These are Hadoop and MongoDB.

Hadoop can be viewed as a software framework that comprises two things:

  • MapReduce: A programming paradigm for large-scale data processing
  • HDFS: A distributed file system for storing data across a cluster of servers

Hadoop runs MapReduce jobs on a cluster of machines and is designed for batch processing as opposed to interactive operations. A MapReduce job, in its simplest form, comprises a Map phase followed by a Reduce phase. Think of the Map phase as a program that maps together all data that has the same key. Similarly, think of the Reduce phase as a program that takes the output of the Map phase and reduces it to a single value. A prototypical example of a MapReduce job is the ‘Word Count’ program. [Learn more about how MapReduce does WordCount here.]

Because of the fact that Hadoop does large sequential reads and writes, it is best suited for batch processing. Hadoop is therefore a throughput bound system. Examples of batch processing workloads that Hadoop is suited for include data preparation jobs and long running reports.

MongoDB is a popular NoSQL database built for modern day application development. MongoDB stores data as documents in a binary representation called BSON (Binary JSON). Documents that share a similar structure are organized as collections. Each document has one or more fields in it. Collections in MongoDB are analogous to tables in a relational database while documents are analogous to rows. The fields within a document are analogous to columns within a relational database. One of key differentiator of MongoDB, in contrast to relational databases, is its robust support for flexible schemas.

MongoDB is very good for key-value queries, range queries and aggregations. In turn this means that most of the I/O generated by MongoDB is random in nature and is a good mix of reads and writes. MongoDB is therefore latency sensitive and low latencies can make a huge benefit from a performance perspective.

How can FVP help?
NoSQL platforms were developed because a class of applications were either very onerous to run on or got very poor performance when run against a traditional database. As discussed above some NoSQL platforms are throughput bound while others are latency sensitive. FVP can play a very beneficial role in accelerating NoSQL platforms because:

  1. FVP accelerates both reads and writes. This means that irrespective of the workload being run on the platform, it can leverage server side resources for speedup.
  2. Because of DFTM these platforms can now leverage server side RAM as an acceleration medium thereby providing a huge jump in performance.
  3. Large intermediate results or resource intensive operations such as sorts can now benefit from read and write acceleration via FVP.

You can now finally virtualize these platforms without worrying about performance. Instead of introducing rigid and silo’ed architectures, its makes sense to use PernixData FVP. FVP, Supported by VMware, allows for clustered and fault tolerant I/O acceleration by using flash or memory. By virtualizing these acceleration resources into one seamless pool, VMs can seamlessly migrate to any host in the cluster while being able to retrieve data throughout the cluster.

Using VM Storage Policies with PernixData FVP Datastore Write Policies

Profile-Driven Storage, introduced in vSphere 5.0, provides rapid and intelligent placement of virtual machines based on pre-defined storage profiles. vSphere 5.5 enhanced the previous version of Storage Profiles and along the way renamed it to VM Storage Policy.

The new architecture looks slightly different than the old one. VM Storage Policy kingpin is the rule-set. The rule-set is a group of rules that describe the storage characteristics of a datastore. These characteristics are provided by either the vendor-specific capabilities (through a VASA provider) or by a user-defined tag. A rule-set can contain multiple rules and a combination of vendor-specific capabilities rules and tag-based rules. This article focuses on using VM storage Policies with rules based on tags. The schematic overview provides insight on the relationship between objects:

01-vSphere VM Storage Policy

Configuring VM Storage Policy
The vSphere UI does not provide a single point of configuration of a VM Storage Policy. The VM Storage Policy UI allows you to define a rule-set and adding tags to the rule-set. However tags should be defined and associated with vSphere objects before creating the rule-set. Creating and assigning tags cannot be done from the VM Storage Policy UI.

Creating Tags
Tags creation can be done in different ways. Storage related tags can best be created either in the tag menu option in the home menu, or at the tag menu option of the manage tab of the datastore cluster or datastore itself. A tag must be assigned to a category. Categories defined the cardinality of the tag and which vSphere objects the tag is associated with.

Please note that you can edit the category and add associable objects at any time, however once set you cannot remove an object. If required, the category needs to be deleted and then created with the correct required objects.

Cardinality allows you to define whether the object accepts one tag or multiple tags from that category. In this scenario I will use the tag to define which FVP storage profile is assigned to the datastore. When accelerating a datastore in FVP, you can assign a default storage policy. All virtual machines, in the vSphere cluster, configured to use that particular datastore will be accelerated accordingly. As datastore write policies are mutually exclusive, they cannot exist on the same datastore at the same time.

02-Overview of Tags

With that in mind, the cardinality of “One tag per object”, aligns perfect to the exclusivity of the FVP write policy. Once an administrator assigns the FVP Write Back mode tag to a datastore, vSphere will not allow the administrator to also assign the FVP write through mode to the datastore.

Assigning a tag is a bit tedious. None of the workflows provide the option to assign the tag to multiple datastores.

Please note that when assigning a tag to a datastore cluster, the tag itself is not waterfall down and is assigned to the members of the datastore. This has to be done manually.

I prefer to create the tags and go to the datastore option in vCenter view. Multi-select (Shift-select) the appropriate datastores, open up submenu (right-click) the selection, and choose Assign Tag…

03-Multi-select datastores

Once the tags have been created and assigned to the vSphere storage objects (Datastore Cluster and Datastore) a VM storage Policy can be created. Go to the Home view and select the VM Storage Policy. After providing an name and description the rule-set is created. A rule-set can contain multiple rules, however you have to add them by category. After selecting the tags, the workflow will show a list of compatible datastores.

In this scenario I have four datastores, Cryo-SYN1,2,3,4, all four are replicated. Depending on the service level agreement I accelerated two datastores with Write-Through mode and two datastores with Write-Back mode. I created two VM Storage policies and assigned the following tags to their rule-set.

04-Overview VM storage Policies

Once the VM Storage Policies were created, VM Storage Policies were enabled onthe compute cluster. The schematic overview provides insight on the relationship between all objects:


If the customer wants to deploy a virtual machine that has an RPO of 15 minutes defined it its service level agreement, the administrator selects the VM Storage Policy of RPO-15-R-DS-FVP-WB. vSphere provisioning process displays the compatible datastores save to provision the virtual machine with an RPO requirement of 15 minutes.


Before configuring the VM Storage Policies, I accelerated the datastores within FVP.

With the help of VM Storage Policies and FVP write polices at datastore level, the virtual machine is placed on a replicated datastore with Write-Back enabled. The VM summary page confirms:
08-VM provisioned