TWO VERY INTERESTING VIDEOS ABOUT HOW COMPUTER ALGORITHMS SHAPES TODAY'S WORLD
Resource management within virtual infrastructures relies on distributed algorithms, as a result I became more and more interested in the application of computer algorithms in other areas. Today I found an English version of the multiple award winning Dutch documentary which I can finally share with my non-dutch speaking friends. The documentary reviews the flash crash on the U.S. Stock Market on May 6th 2010. In particular it explores the application of black box trading (algo-trading) and how algorithms shaped and formed the architecture of not only of the trade market institutions but also city architectures and human terraforming. Money & Speed: Inside the Black Box (Marije Meerman, VPRO) Sit back and be amazed. Once viewed, view the TED presentation by Kevin Slavin: How algorithms shape our world. Kevin Slavin zooms in some of the algorithms applied in our lives and how it affect us and our surroundings.
ADJUSTING THE COST OF VMOTION – A WORD OF CAUTION
Yesterday I posted an article on how to change the cost of vMotion in order to change the default number of concurrent vMotion. As I mentioned in the article, I’m not a proponent of changing advanced settings. Today Kris posted a very interesting question; How about the scenario where one uses multi NIC vMotion for against two 5Gbps virtual adapters)? I know a cost of 4 will be set for the network by the VMkernel, however as the aggregate bandwidth becomes 10Gbps is it safe enough to raise the limit? Perhaps not to the full 8 for 10Gbps, but 6?
LIMITING THE NUMBER OF CONCURRENT VMOTIONS
After explaining how to limit the number of concurrent Storage vMotions operations, I received multiple questions on how to limit the number of concurrent vMotion operations. This article will cover the cost and max cost constructs and show you how to calculate the correct config key values to limit the number of concurrent vMotion operations. Please note I usually do not post on configuration keys that change default behavior simply because I feel that most defaults are sufficient, and it should only be changed as a last resort when all other avenues are exhausted. I would like to mention that this is an unsupported configuration. Support will request to remove these settings before troubleshooting your environment!
ADDING NEW DISK TO AN EXISTING VIRTUAL MACHINE IN A STORAGE DRS DATASTORE CLUSTER
Recently I had some discussions where I needed to clarify the behavior of Storage DRS when the user adds a new disk to a virtual machine that is already running in the datastore cluster. Especially what will happen if the datastore cluster is near its capacity? When adding a new disk, Storage DRS reviews the configured affinity rule of the virtual machine. By default Storage DRS applies an affinity rule (Keep VMDKs together by default) to all new virtual machines. vSphere 5.1 allows you to change the default behavior of the cluster, you can change the default affinity rule in the Datastore cluster settings: Adding a new disk to an existing virtual machine The first one to realize is that Storage DRS never can violate the affinity or anti-affinity rule of the virtual machine. For example, if the datastore cluster default affinity rule is set to “keep VMDKs together” then all the files are placed on the same datastore. Ergo if a new disk is added to the virtual machine, that disk must be stored on the same datastore in order not to violate the affinity rule. Let use an example, VM1 is placed in the datastore and is configured with the Intra-VM affinity rule (Keep the files inside the VM together). The virtual machine is configured with 2 hard disks and both reside on the datastore [nfs-f-07] of the datastore cluster When adding another disk, Storage DRS provides me with an recommendation Although all the other datastores inside the datastore cluster are excellent candidates as well, Storage DRS is forced to place the VMDK on the same datastore together with the rest of the virtual machine files. Datastore cluster defragmentation At one point you may find yourself in a situation where the datastores inside the datastore cluster are quite full. Not enough free space per datastore, but enough free space in the datastore cluster to host another disk of an exisiting virtual machine. In that situation, Storage DRS does not break the affinity rule but starts to “defragment” the datastore cluster. It will move virtual machines around to provide enough free space for the new VMDK. The article “Storage-DRS initial placement and datastore cluster defragmentation" can provide you more information about this mechanism. Key takeaway Therefore in a datastore cluster you will never see Storage DRS splitting up a virtual machine if the VM is configured with an affinity rule, but you will see pre-requisite moves, migrating virtual machines out of the datastore to make room for the new vmdk.
WHY DO YOU MANUALLY SELECT A DATASTORE WHILE USING STORAGE DRS?
On the community forums a couple of threads are active about the Storage DRS Automation level behavior when trying to manually migrate a virtual machine between datastores in the same datastore cluster. When migrating within the datastore cluster, Storage DRS is disabled for that virtual machine. Some community members asked me why Storage DRS disables automation for this virtual machine when migrating between datastores inside a datastore cluster or when selecting a datastore during placement of a new virtual machine. Intent It is all about intent. When migrating a virtual machine into a datastore cluster, you are migrating the virtual machine into a load-balancing domain (the datastore cluster). You allow and trust Storage DRS to provide you an environment that provides an optimum load balanced state where the virtual machines receive the overall best I/O performance and the optimal placement regarding space utilization. If the user wants to migrate the virtual machine to a different datastore inside the datastore cluster, Storage DRS is capturing this intent, as “user knows best”. The way this is designed is that if a datastore is selected, then user is telling us that the selected datastore is the best, i.e. user knows something Storage DRS doesn’t. And to prohibit any future migration recommendation to other datastores, Storage DRS is disabled to ensure permanent placement. This behavior also applies when migrating a virtual machine into a datastore cluster. During initial placement it is expected that the user selects the datastore cluster, if the user wants to select a specific datastore it has to select “Disable Storage DRS for this virtual machine” in order to be able to select a member datastore. But this brings me to the question I have; what is the reason for not trusting Storage DRS? Why do you manually select a datastore while using Storage DRS? Apparently old habits die-hard and most tell me that their administrator feels like they could beat Storage DRS in placement. I’ve written a couple of articles about it (plus two 100+ page chapters featured in two books) about the working of Storage DRS and trust me it’s very difficult to beat Storage DRS placement and migration recommendations. During development the engineers try to run an equivalent experiment to IBM big blue versus Gary Kasparov and lined up two world-class storage experts versus Storage DRS. Although they received answers to all their questions they could not match the overall performance improvement Storage DRS could provide. Correlation of metrics Storage DRS has a lot of visibility into the environment, it measures space growth rates of existing virtual machine with thin disks, snapshots, etc. It selects destination datastore based on current utilization and growth rates. It builds device models to understand the performance of the devices backing the datastore as well as measuring the overall load on the datastores. It creates workload models of the existing virtual machine and measures on multiple metrics. Due to the insights Storage DRS can decide to migrate virtual machines to other datastores in order to make room or avoid the I/O threshold. It analyzes the environment and prefers moving virtual machines with low storage vMotion overhead. For more information please read the following articles: Storage DRS automation level and initial placement behavior Storage DRS Initial placement and datastore cluster defragmentation Avoiding VMDK level over commitment while using Thin disks and Storage DRS If you have a specific use case, for example to run some benchmark test on a datastore, then the option “Disable Storage DRS for this virtual machine” helps you to prevent Storage DRS from interrupting your test. However I would recommend selecting the datastore cluster as a destination instead of a specific datastore when migrating a virtual machine into a datastore cluster. Read the article Storage vMotion migration into a datastore cluster for more information. Remember Storage DRS always generate a recommendation that you can review during provisioning. After selecting the destination (datastore cluster), the user interface provides an overview of the current selections, at the right part of the screen a link “more recommendations” is provided. More recommendations After you click on the more recommendation link, the user interface provides you with a list of alternative recommendations. The order of the list is that the top recommendation provides the best placement, this is the same recommendation listed in the previous review selection screen. The list provides an overview of the space utilization before placement (2nd column), space utilization after placement (3rd column) and I/O latency before placement on the destination datastore (4th column). As the screenshot shows, the 2nd recommendation shows that placing the EMC-003 datastore provides the best placement. This datastore has the lowest utilization before and after placement and has the lowest I/O latency of all the other datastores inside the datastore cluster. Use this screen to educate your team responsible for provisioning and placement, show them that Storage DRS take multiple metrics into account and review the impact of the result if they picked the datastore of their choice. For my education, please share your thoughts on why you want to manually select a datastore that is a part of a datastore cluster?
VSPHERE 5.1 WEB CLIENT: VM OVERRIDES -STORAGE DRS AUTOMATION LEVEL OVERVIEW
Overall the vSphere 5.1 web client attempts to mimic the behavior of menus and settings workflows of the (old) vSphere client. When editing the settings of a datastore cluster, the web client provides the same set of options that can be edited as the vSphere client. However certain functions of overviews and menus are changed in the vSphere 5.1 web client. For example the VM overrides screen. The primary purpose of the VM overrides screen is to display deviant Storage DRS Automation level of the virtual machines inside the datastore cluster. VM overrides and Virtual Machine Settings screens The VM overrides screen is located in the storage view, select the datastore cluster, select the tab Manage and click on the Settings button. The VM overrides screen is the replacement of the virtual machine settings screen of the datastore cluster settings in the vSphere client. Difference in default behavior As you might have noticed, the web client is not listing any virtual machine while the Virtual Machine settings overview in the vSphere client shows 5 virtual machines and a VM template. Already mentioned in the introduction paragraph, the primary purpose of the VM overrides screen has changed from the Virtual Machine settings overview in the vSphere client. The VM Overrides screen only displays a virtual machine is set to a non-default automation level. To display the different behavior, I have change the Automation level of VM3, VM4, VM5. The datastore cluster is configured with a Manual Automation Mode. Therefor the default automation mode is Default (Manual). The previous screenshot shows that all virtual machines are configured with the Default (Manual) automation level, VM3 is changed to Fully Automated, VM4 to Manual and VM5 to disabled. If you want to reproduce this behavior in your own environment, change the automation level in the vSphere client and then go the VM overrides screen in the web client to see the modified virtual machines listed. The VM overrides screen displays the following: Even though VM4 is configured with the same automation level as the datastore cluster, the VM overrides screen displays VM4 as it is not configured with the default automation mode. By changing the automation mode back to Default (Manual) via the Edit screen, VM4 is removed from the VM overrides list. To be honest it took me a while to get used to the new functionality of this screen. I would like to know if you like this new behavior or if you rather prefer the way the virtual machine settings view in the old vSphere client works?
MANUAL STORAGE VMOTION MIGRATIONS INTO A DATASTORE CLUSTER
Frequently I receive questions about the impact of a manual migration into a datastore cluster, especially about the impact of the VM disk file layout. Will Storage DRS take the initial disk layout into account or will it be changed? The short answer is that the virtual machine disk layout will be changed by the default affinity rule configured on the datastore cluster. The article describes several scenarios of migrating “distributed“ and “centralized” disk layout configurations into datastore cluster configured with different affinity rules. Test scenario architecture For the test scenarios I’ve build two virtual machines VM1 and VM2 Both virtual machines are of identical VM configuration, only the datastore location is different. VM1-centralized has a “centralized” configuration, storing all VMDKs on a single datastore, while VM2-distributed has a “distributed” configuration, storing all VMDKs on separate datastores.
STORAGE DRS AND STORAGE VMOTION BUGS SOLVED IN VSPHERE 5.0 UPDATE 2.
Today Update 2 for vSphere ESXI 5.0 and vCenter Server 5.0 were released. I would like to highlight two bugs that have been fixed in this update, one for Storage DRS and one for Storage vMotion Storage DRS vSphere ESXi 5.0 Update 2 was released today and it contains a fix that should be interesting to customers running Storage DRS on vSphere 5.0. The release note states the following bug:
MULTI-NIC VMOTION – FAILOVER ORDER CONFIGURATION
After posting the article “designing your vMotion network” I quickly received the question which failover order configuration is better. Is it better to configure the redundant NIC(s) as standby or as unused? The short answer: always use standby and never unused! Tomas Fojta posted the comment that it does not make sense to place the NICs into standby mode: In the scenario as depicted in the diagram I prefer to use active/unused. If you think about it the standby option does not give you anything as when one of the NICs fails both vmknics will be on the same NIC which does not give you anything.
THIN OR THICK DISKS? – IT’S ABOUT MANAGEMENT NOT PERFORMANCE
This is my contribution to the debate Zero or Thick disks – debunking the performance myth. The last couple of years all sorts of VMware engineers worked very hard to reduce the performance difference between thin disks and thick disks. Many white-papers have been written by performance engineers to explain the improvements made on thin-disk. Therefore today the question whether to use Thin-provisioned disks or Eager zero thick is not about the difference in performance but the difference in management. When using Thin-provisioned VMDKs you need to have a very clear defined process. What to do, when your datastore, which stores the thin provisioned disks is getting full? You need to define a consolidation ratio, you need to understand which operational process might be dangerous to your environment (think Patch-Tuesday) and what space utilization threshold you need to define before migrating thin-provisioned disks to other datastores. Today Storage DRS can help you with many of the fore mentioned challenges. For more information please read the article: Avoiding VMDK level over-commitment while using Thin-provisioned disks and Storage DRS. If Storage DRS is not used, Thin-provisioned disks can require a seamless collaboration between virtualization teams (provisioning and architecture) and storage administrators. When this is not possible due to organizational cultural differences, thin provisioning is rather a risk, than bliss. Zero out process: Eager zero thick on the other hand might provide in some (corner) cases a marginal performance increase; the costs involved could outweigh the perceived benefits. First of all, Eager zero thick disks need to be zeroed out during creation, when your array doesn’t support the VAAI initiatives, this can take a hit on performance and the time to provision is extended. With terabyte sized disks becoming more common this will impact provisioning time immensely. Waste of space: Most virtualized environments use virtual machines, typically configured with oversized OS disks and over-specced data disks, resulting in wasted space full of zero’s. Thin-provisioned disks only occupy the space used for storing data, not zero’s. Migration: Storage vMotion goes out of its way to migrate every little bit of a virtual disk, this means it needs to copy over every zeroed out block. Combined with the oversized disks, you are creating unnecessary overhead on your hosts and storage subsystem copying and verifying the integrity of zeroed out blocks. Migrating thin disks only requires migrating the “user-data”, resulting in faster migration times, lesser overhead on hosts and storage subsystem. In essence, Thin-provisioned disks versus Eager zero thick is all about resource/time saving versus risk avoidance. Choose wisely