Basic elements of the flash virtualization platform – Part 1

During my recent trip to the headquarters of PernixData I had the opportunity to pick the brains of Satyam and some of the key engineers. Throughout the discussions and lectures I learned a lot about the design choices engineering faced when creating the Flash Virtualization Platform. When dealing with super low (microsecond) latency devices such as flash the last thing you want to do is add latency with your own software. Making the right choices is crucial in creating a platform that is both efficient and effective in providing performance to your applications. The following series of articles expand on the elements of which the Flash Virtualization Platform is comprised of and share the motivation why certain elements are chosen over others. Let’s start of the series by zooming in why a kernel module is chosen over a virtual appliance.

Kernel module versus virtual appliance
One of the first problems to solve is to keep the length of the I/O path as short as possible in order to leverage the performance characteristics of flash. A common solution is to use a virtual appliance that consumes the local flash resource, however when analyzing the I/O path of the virtual machine it becomes clear that this is perpendicular to the criteria of a shortest I/O path. As the diagram in the next paragraph illustrates there are multiple distinctive steps in the I/O path, each step increases the data path and introduce a potential delay on various levels.

Resource management challenges of a virtual appliance
Besides extending the I/O path, resource management of a virtual appliance is a challenge. Virtual appliances are bound by the ESX scheduler. From a kernel perspective the virtual appliance is just another virtual machine, meaning that the virtual appliance just have to wait its turn in the CPU queue along all the other virtual machines the virtual appliance is serving.

IOpath-virtualAppliance

The virtual appliance might not get scheduled enough having the IO to sit and wait inside the virtual appliance (#3) or in the vmkernel (#2) before finding its way to the kernel storage stack where it affected again by queues. Similar issues arise on the IO completion path. From Flash, IO will complete (#5) to the virtual appliance, the appliance has to be scheduled to process the completion, then the appliance needs to issue a completion (#6) to the VM (#7) via the vmkernel and the VM has to be scheduled to receive the completion (#8). On a low utilized system this problem will not become apparent soon, but when growing and adding more and more virtual machines the I/O acceleration will rapidly diminish.

One might say, apply CPU reservations! But per-VM reservations introduce a whole new set of challenges all the way up to impacting a cluster wide HA-failover policy configuration. Furthermore CPU reservations do not guarantee the virtual appliance to have instant access to the CPU itself, if another vCPU is scheduled it is allowed to finish its operation causing delay in I/O operations within the virtual appliance.

Another problem is that the virtual appliance is not scheduled proportionally to the amount of traffic coming. Traffic generated by the virtual cores it is serving, making it a challenge to right size the virtual appliance appropriately. You have to deal with sizing of the virtual appliance when virtual infrastructure grows or shrinks. As we know oversized virtual machines don’t behave very well in a virtual infrastructure, while an undersized virtual machine performs badly.

It isn’t only a CPU problem! The number of memory copies involved in the virtual appliance IO path typically incur additional overhead, especially when doing IO to a high troughput device like flash, because of the number of commands flying around. VMware has provided ways to establish lower overhead communication channels among VMs (see VMCI: http://www.vmware.com/support/developer/vmci-sdk/), but this technique is not available between the core ESXi hypervisor kernel and VMs.

Taking all these factors into account, it became clear that a virtual appliance would not provide the most efficient and effective way of leveraging the low latency performance a flash device provides.

FVP uses a kernel module
Instead the engineers created a kernel module. The flash virtualization platform (FVP) extends the hypervisor by installing a kernel module into the VMkernel. The way it works is that the kernel module “hijacks” the virtual machine I/O path as it comes out of the virtual machine into the hypervisor. Before it goes out to its storage system FVP determines whether that I/O should be served from the flash device or from the external storage system. As it is a part of the hypervisor it is as fast as it gets in terms of processing I/O on behalf of the virtual machine. And as it’s a kernel module there is no need for tweaking memory and CPU configurations, and on top of that there is no danger of mistakenly powering down the virtual appliance.

Virtual appliance = bad?
Not per se. But when you deal with server side flash in the data path and access flash in memory, the problem is that every overhead you introduce in the data path will show up very very quickly. When dealing with memory you are accessing nanosecond material, flash resources latency are in the micro seconds while spindles and virtual appliances deal in milliseconds. Therefore if your goal is to separate the performance tier from your capacity tier, it makes sense to use an architecture that is suitable for each tier. A great architecture is to marry both solutions. Use FVP for your performance tier, while using a virtual appliance for the capacity tier.

New Book Project: Tweet sized vSphere Design Considerations – Call for Entries

Today I am excited to announce our (Duncan and I) newest project. Over the next 7 days we will be gathering the best vSphere design considerations around and compiling it into a pocket-sized book. The current working title is “Tweet-sized vSphere design considerations”. As this book is created by people from the virtualization community for the virtualization community, this book will be available free of cost.

vSphere Design Considerations
In the vSphere clustering deepdive series we emphasized certain design consideration by calling them out in “Basic design principles” textboxes. Basic design principles provide quick and simple as well as deep and quintessential information to make architectural design decisions.

The technical deepdive books and their basic design principles focus on HA and distributed resource management features but there are a lot of basic design principles for the other elements in a virtual infrastructure.

We approached some of the industry leading minds to contribute their design considerations for various elements of the virtual infrastructure. However we believe that a lot of practitioners in the virtualization community can contribute to make the book a real success.

Time to gather and aggregate them into one single book that can become a pocketbook of inspiration for all virtual infrastructure architects, admins and consultants.

Call for entries
Do you have a design consideration that consistently apply in your customer environment? Here is your chance to share it with the rest of the virtualization community. If your design consideration is selected it will be featured in the book. Your name, title (vExpert, VCDX number) and twitter handle will be listed along your design consideration.

The rules
Each design consideration should be tweet-sized like. 140 characters might be a challenge, therefor we slight adjusted the limitation and we allow a maximum of 200 characters (excluding spaces).

We are looking for design consideration in the following categories:

  • Host design
  • Cluster design
  • vCenter design
  • Networking and Security design
  • Storage design

To prevent oversaturation we do not allow more than a total of three design considerations per category. For example, you can provide us with three design considerations for the Host category but you could also choose to provide a single design consideration for each category. It’s up to you to decide which category and how many you want to provide. Be aware that we rather see one excellent design consideration than three mediocre ones.

Level of Quality
There are no requirements for submitting your design decision. You do not have to be a vExpert or VCDX to participate. However we strive to have a consistent level of quality of design considerations featured in the book. Please check out the “basic design principles” in one of the vSphere clustering Deepdive books, these are the level of quality we are looking for.

The selection process
Both Duncan and I encourage simplicity. Simplicity in design usually removes as much overhead as possible, which in turn increases flexibility. To quote Martin Fowler “The cost of flexibility is complexity” and in today’s cloud focused market flexibility is key. Be mindful of that.

Therefore try to aim for simplicity in both design and messaging. Similar to the architectural design try to simplify your message as much as possible. This does not mean that you are requested to dumb down your message. Use clear, clean communication, so that it could not be misunderstood.

The following five bloggers will judge the submitted entries:

  • Frank Denneman
  • Duncan Epping
  • Cormac Hogan
  • Jason Nash
  • Vaughn Stewart

Project schedule

  1. Announcement and Call for Entries (Today)
  2. Deadline for Call for Entries (June, 18st)
  3. Deadline selection design considerations by judges (June 30th)
  4. Book design and print process
  5. Book Availability (VMworld 2013)

Once the book is complete we shall publicize the list of people mentioned in the book, we will not share information during the production process of the book.

This book is free!
PernixData generously offered to print the book. If your design consideration is included in the book, you will receive a copy of the book. At their booth at VMworld PernixData will have a copy available for people who submitted a winning design consideration. A limited number of books will be available for the community. More details will follow. After VMworld an E-book version of the book will be made publicly available.

How to enter?
Want to be a part of something cool and unique? Follow this link and share your design consideration with the virtualization community by filling out the form.

Please note, the deadline for call for entries will close at Tuesday 18th June.

Want to have my former job?

My old technical marketing team of VMware is looking for someone to cover resource management. If you have a passion for resource management of virtual infrastructures and like to help VMware’s field personnel, partners and customers understand the technology then this job might be something for you.

A large part of my role was bridging between engineering, product management, product marketing and the field / customers. Provide information to the R&D side of VMware how the products are used and what features customers are requesting. You create collateral in every way or form to help the customer and field personnel understand and adopt the features.

I always enjoyed working with the different teams at VMware. The cloud resource management and vMotion team are an awesome group to work with. Be prepared to deep dive with these guys Marianas trench style. Having a customer facing background helps you provide the team valuable information to align the features to the customer wishes.

In this role you assist product marketing and product management in achieving their tactical and strategic plans.

Besides working with the responsible engineering and product marketing teams you collaborate with your technical marketing colleagues. You have the ability to interact with guys such as Ken Werneberg, Cormac Hogan, Mike Foley, William Lam, Alan Renouf or Rawlinson Rivera on a daily basis.

If you have thorough understanding of the vMotion features, DRS, Storage DRS, SIOC and DPM and love to help customers adopt these features, then apply now!

http://jobs.vmware.com/job/Palo-Alto-Sr_-Technical-Marketing-Manager-Resource-Management-Job-CA-94301/2593496/

Please note
Be aware that this is my former role and that I no longer work for VMware. Therefor I cannot answer any further inquiries. Please contact the VMware career team.

PernixData about the release the new Beta

This week PernixData is all over the news with the outcome of series B funding but besides financials there is more exciting news. Pernixdata is about to release Beta II and I believe it might be even today or tomorrow. The new beta contains some great stuff, due to NDA I’m not allowed to disclose the goodness included in Beta II (I’m not an employee yet). All I can tell you that Satyam was demoing most of the new features during the Storage Field day. Go check out the videos if you haven’t seen his presentation.

If you don’t have the time to view the complete presentation I would suggest you read Chethan’s recent post – Get Pernix’d. It give’s you great insights on the value add of FVP in a virtual infrastructure.

After seeing the video and reading the articles the question do you want to get Pernix’d might a superfluous one. Join the beta program today.