WHERE IS MY NEW VMOTION FUNCTIONALITY?

Just a reminder as I received a lot of questions and comments about this: The new vMotion functionality - migrating virtual machines between host without shared storage - is only available via the web client. Please note that in the vSphere 5.1 release all new features are only visible via the web client and not in the old vSphere client. For more information about the vMotion functionality: vSphere 5.1 vMotion deepdive Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman

VSPHERE 5.1 VMOTION DEEP DIVE

vSphere 5.1 vMotion enables a virtual machine to change its datastore and host simultaneously, even if the two hosts don’t have any shared storage in common. For me this is by far the coolest feature in the vSphere 5.1 release, as this technology opens up new possibilities and lays the foundation of true portability of virtual machines. As long as two hosts have (L2) network connection we can live migrate virtual machines. Think about the possibilities we have with this feature as some of the current limitations will eventually be solved, think inter-cloud migration, think follow the moon computing, think big! The new vMotion provides a new level of ease and flexibility for virtual machine migrations and the beauty of this is that it spans the complete range of customers. It lowers the barrier for vMotion use for small SMB shops, allowing them to leverage local disk and simpler setups, while big datacenter customers can now migrate virtual machines between clusters that may not have a common set of datastores between them. Let’s have a look at what the feature actually does. In essence, this technology combines vMotion and Storage vMotion. But instead of either copying the compute state to another host or the disks to another datastore, it is a unified migration where both the compute state and the disk are transferred to different host and datastore. All is done via the vMotion network (usually). The moment the new vMotion was announced at VMworld, I started to receive questions. Here are the most interesting ones that allows me to give you a little more insight of this new enhancement. Migration type One of the questions I have received is, will the new vMotion always move the disk over the network? This depends on the vMotion type you have selected. When selecting the migration type; three options are available: This may be obvious to most, but I just want to highlight it again. A Storage vMotion will never move the compute state of a VM to another host while migrating the data to another datastore. Therefore when you just want to move a VM to another host, select vMotion, when you only want to change datastores, select Storage vMotion. Which network will it use? vMotion will use the designated vMotion network to copy the compute state and the disks to the destination host when copying disk data between non-shared disks. This means that you need to the extra load into account when the disk data is being transferred. Luckily the vMotion team improved the vMotion stack to reduce the overhead as much as possible. Does the new vMotion support multi-NIC for disk migration? The disk data is picked up by the vMotion code, this means vMotion transparently load balances the disk data traffic over all available vMotion vmknics. vSphere 5.1 vMotion leverages all the enhancements introduced in the vSphere 5.0 such as Multi-NIC support and SDPS. Duncan wrote a nice article on these two features. Is there any limitation to the new vMotion when the virtual machine is using shared vs. unshared swap ? No, either will work, just as with the traditional vMotion. Will the new vMotion features be leveraged by DRS/DPM/Storage DRS ? In vSphere 5.1 DRS, DPM and Storage DRS will not issue a vMotion that copies data between datastores. DRS and DPM remains to leverage traditional vMotion, while Storage DRS issues storage vMotions to move data between datastores in the datastore cluster. Maintenance mode, a part of the DRS stack, will not issue a data moving vMotion operation. Data moving vMotion operations are more expensive than traditional vMotion and the cost/risk benefit must be taken into account when making migration decisions. A major overhaul of the DRS algorithm code is necessary to include this into the framework, and this was not feasible during this release. How many concurrent vMotion operations that copies data between datastores can I run simultaneously? A vMotion that copies data between datastores will count against the limitations of concurrent vMotion and Storage vMotion of a host. In vSphere 5.1 one cannot perform more than 2 concurrent Storage vMotions per host. As a result no more than 2 concurrent vMotions that copy data will be allowed. For more information about the costs of the vMotion process, I recommend to read the article: “Limiting the number of Storage vMotions” How is disk data migration via vMotion different from a Storage vMotion? The main difference between vMotion and Storage vMotion is that vMotion does not “touch” the storage subsystem for copy operations of non-shared datastores, but transfers the disk data via an Ethernet network. Due to the possibilities of longer distances and higher latency, disk data is transferred asynchronously. To cope with higher latencies, a lot of changes were made to the buffer structure of the vMotion process. However if vMotion detects that the Guest OS issues I/O faster than the network transfer rate, or that the destination datastore is not keeping up with the incoming changes, vMotion can switch to synchronous mirror mode to ensure the correctness of data. I understand that the vMotion module transmits the disk data to the destination, but how are changed blocks during migration time handled? For disk data migration vMotion uses the same architecture as Storage vMotion to handle disk content. There are two major components in play – bulk copy and the mirror mode driver. vMotion kicks off a bulk copy and copies as much as content possible to the destination datastore via the vMotion network. During this bulk copy, blocks can be changed, some blocks are not yet copied, but some of them can already reside on the destination datastore. If the Guest OS changes blocks that are already copied by the bulk copy process, the mirror mode drive will write them to the source and destination datastore, keeping them both in lock-step. The mirror mode driver ignores all the blocks that are changed but not yet copied, as the ongoing bulk copy will pick them up. To keep the IO performance as high as possible, a buffer is available for the mirror mode driver. If high latencies are detected on the vMotion network, the mirror mode driver can write the changes to the buffer instead of delaying the I/O writes to both source and destination disk. If you want to know more about the mirror mode driver, Yellow bricks contains a out-take of our book about the mirror mode driver. What is copied first, disk data or the memory state? If data is copied from non-shared datastores, vMotion must migrate the disk data and the memory across the vMotion network. It must also process additional changes that occur during the copy process. The challenge is to get to a point where the number of changed blocks and memory are so small that they can be copied over and switch over the virtual machine between the hosts before any new changes are made to either disk or memory. Usually the change rate of memory is much higher than the change rate of disk and therefore the vMotion process start of with the bulk copy of the disk data. After the bulk data process is completed and the mirror mode driver processes all ongoing changes, vMotion starts copying the memory state of the virtual machine. But what if I share datastores between hosts; can I still use this feature and leverage the storage network? Yes and this is very cool piece of code, to avoid overhead as much as possible, the storage network will be leveraged if both the source and destination host have access to the destination datastore. For instance, if a virtual machine resides on a local datastore and needs to be copied to a datastore located on a SAN, vMotion will use the storage network to which the source host is connected. In essence a Storage vMotion is used to avoid vMotion network utilization and additional host CPU cycles. Because you use Storage vMotion, will vMotion leverage VAAI hardware offloading? If both the source and destination host are connected to the destination datastore and the datastore is located on an array that has VAAI enabled, Storage vMotion will offload the copy process to the array. Hold on, you are mentioning Storage vMotion, but I have Essential Plus license, do I need to upgrade to Standard? To be honest I try to keep away from the licensing debate as far as I can, but this seems to be the most popular question. If you have an Essential Plus license you can leverage all these enhancements of vMotion in vSphere 5.1. You are not required to use a standard license if you are going to migrate to a shared storage destination datastore. For any other licensing question or remark, please contact your local VMware SE / account manager. Update: Essential plus customers, please update to vCenter 5.1.0A. For more details read the follow article: “vMotion bug fixed in vCenter Server 5.1.0a”. Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman

STORAGE DRS DATASTORE CLUSTER DEFAULT AFFINITY RULE

In vSphere 5.1 you can configure the default (anti) affinity rule of the datastore cluster via the user interface. Please note that this feature is only available via the web client. The vSphere client does not contain this option. By default the Storage DRS applies an intra-VM vmdk affinity rule, forcing Storage DRS to place all the files and vmdk files of a virtual machine on a single datastore. By deselecting the option “Keep VMDKs together by default” the opposite becomes true and an Intra-VM anti-affinity rule is applied. This forces Storage DRS to place the VM files and each VDMK file on a separate datastore. Please read the article: “Impact of intra-vm affinity rules on storage DRS” to understand the impact of both types of rules on load balancing.

VSPHERE 5.1 STORAGE VMOTION PARALLEL DISK MIGRATIONS

Where previous versions of vSphere copied disks serially, vSphere 5.1 allows up to 4 parallel disk copies per Storage vMotion operation When you migrate a virtual machine with five VMDK files, Storage vMotion copies of the first four disks in parallel, then starts the next disk copy as soon as one of the first four finishes. To reduce performance impact on other virtual machines sharing the datastores, parallel disk copies only apply to disk copies between distinct datastores. This means that if a virtual machine has multiple VMDK files on Datastore1 and Datastore2, parallel disk copies will only happen if destination datastores are Datastore3 and Datastore4. Let’s use an example to clarify the process. Virtual machine VM1 has four vmdk files. VMDK1 and VMDK2 are on Datastore1, VMDK3 and VMDK4 are on Datastore2. The VMDK files are moved from Datastore1 to Datastore4 and from Datastore2 to Datastore3. VMDK1 and VMDK3 are migrated in parallel, while VMDK2 and VMDK4 are queued. The migration process of VMDK2 is started the moment the migration of VMDK1 is complete, similar for VMDK4 as it will be started when the migration of VMDK3 is complete. A fan out disk copy, in other words copying two VMDK files on datastore A to datastores B and C, will not have parallel disk copies. The common use case of parallel disk copies is the migration of a virtual machine configured with an anti-affinity rule inside a datastore cluster.

STORAGE DRS DATASTORE CORRELATION DETECTOR

One of the cool new features of Storage DRS in vSphere 5.1 is the datastore correlation detector used by the SIOC injector. Storage arrays have many ways to configure datastores from among the available physical disk and controller resources in the array. Some arrays allow sharing of back-end disks and RAID groups across multiple datastores. When two datastores share backend resources, their performance characteristics are tied together: when one datastore experiences high latency, the other datastore will also experience similar high latency since IOs from both datastore are being serviced by the same disks. These datastores are considered “performance-related”. I/O load balancing operations in vSphere 5.1 avoid recommending migration of virtual machines between two performance-correlated datastores. I/O load balancing algorithm Storage DRS collects several virtual machine metrics to analyze the workload generated by the virtual machines within the datastore cluster. These metrics are aggregated in a workload model. To effectively distribute the different load of the virtual machines across the datastores, Storage DRS needs to understand the performance (latency) of each datastore. When a datastore violates its I/O load threshold, Storage DRS moves virtual machines out of the datastore. By linking workload models to device models, Storage DRS is able to select a datastore with a low I/O load when placing a virtual machine with a high I/O load during load balance operations. Performance related datastores However if data is moved between datastores that are backed by the same disks, the move may not decrease the latency experienced on the source datastore as the same set of disks, spindles or RAID-groups service the destination datastore as well. I/O load balancing recommendations should avoid using two performance-correlated datastores, since moving a virtual machine from the source datastore to the destination datastore has no effect on the datastore latency. How does Storage DRS discover performance related datastores? How does it work? The datastore correlation detector measures performance during isolation and when concurrent IOs are pushed to multiple datastores. The basic mechanism of correlation detector is rather straightforward: compare the overall latency when two datastores are being used alone in isolation and when there are concurrent IO streams on both of the datastores. If there is no performance correlation, the concurrent IO to the other datastore should have no effect. Contrariwise, if two datastores are performance correlated, then concurrent IO stream should amplify the average IO latency on both datastores. Please note that datastores will be checked for correlation on a regular basis. This allows Storage DRS to detect changes to the underlying storage configuration. Example scenario In this scenario Datastore1 and Datastore2 are backed by disk devices grouped in Diskgroup1, while Datastore3 and Datastore4 are backed by disk devices grouped in Diskgroup2. All four datastores belong to a single datastore cluster. After SIOC has run the workload and device models on a datastore, SIOC picks a random datastore in the datastore cluster to check for correlations. If both datastores are idle, the datastore correlation detector uses the same workload to measure the average I/O latency in isolation and concurrent I/O mode. Isolation The SIOC injector measures the average IO latency of Datastore1 in isolation. This means it measures the latency of the outstanding I/O of Datastore1 alone. Next, it measures the average IO latency of Datastore2 in isolation. Concurrent I/Os The first two steps are used to establish the baseline for each datastore. In the third step the SIOC injector sends concurrent I/O to both datastores simultaneously. This results in the behavior that Storage DRS does not recommend any I/O load balancing operations between Datastore1 and 2 and Datastore3 and 4, but it can recommend for example to move virtual machines from Datastore1 to Datastore2 or from Datastore2 to Datastore3, etc. All moves are possible as long as the datastores are not correlated. Enable Storage DRS on performance-correlated datastores? When two datastores are marked as performance-correlated, Storage DRS does not generate IO load balancing recommendations between those two datastores. However Storage DRS can be used for initial placement and still generate recommendations to move virtual machines between two correlated datastores to address out of space situations or to correct rule violations. Please keep in mind that some arrays use a subset of disk out of a larger diskpool to back a single datastore. With these configurations, it appears that all disks in a diskpool back all the datastores but in reality they don’t. Therefor I recommend to set Storage DRS automation mode to manual and review the migration recommendations to understand if all datastores within the diskpool are performance-correlated.

VSPHERE 5.1 CLUSTERING DEEPDIVE AVAILABLE

Duncan and I released the vSphere 5.1 Clustering deepdive book this week. The book contains the new features of the vSphere 5.1 suite. We rewrote the Storage DRS chapter and have added a complete new chapter focusing on Stretched Clusters. Font changes The challenge for us was to include all the new content in the book without allowing the book to grow beyond its trademark dimensions. To achieve this we used a different font and decreased the font size, this resulted in a growth of 80 pages, making it 415 pages instead of the 505 pages if we used the previous font. Please note that although we decreased the font size, this did not decrease the legibility of the book. Special cover The cover is designed in such a way that you can actually have multiple copies with all different shades of orange, dare I say 50 shades of Orange. ;) We hope you enjoy the new version of the vSphere clustering deepdive series. It’s available in Paperback and Kindle format. Paper copy – $ 24.95 Kindle version – $ 7.49

CLOUDPHYSICS IN A NUTSHELL

Disclaimer: I’m a technical advisor for CloudPhysics. I’m very happy to see CloudPhysics coming out of stealth mode this week and making their beta product available to the public. In a nutshell CloudPhysics is bringing Big Data analytics to the IT environment and it will provide you with tools to analyze your datacenter. How does it acquire this dataset and what benefit do you get from it? The Observer Appliance To gather all that data, an Observer Appliance needs to run in the virtual infrastructure. And in order to get a valuable dataset that is used for analytics and simulations the Observer needs to be active in as many as virtual infrastructures as possible. Running an appliance that sends operational data to a third party like CloudPhysics can be a security concern. Going into detail about how CloudPhysics designed the system to handle privacy, security and data sharing issues is outside the scope of this article. In short, data extracted from the virtual infrastructure are performance statistics and inventory and configuration settings. All environmental details are scrubbed and no log files or content of disk and memory is gathered. The User Interface The data acquired by the Observer Appliance is accessible at https://app.cloudphysics.com. Logging in will give you access to your own data. The beta product provides a user interface that allows you to dive into specific focus areas. The UI provides so-called cards that displays key data points and is a launch point to a more detailed view. This view can contain information about the relationship with other features of the vSphere stack. An example of such a card would be Virtual Machine level reservations. Not only does this card provide you information about the present virtual machine level reservations in your environment in a clear and concise manner, it also displays the impact the reservation has on the High Availability slot size and therefor the consolidation ratio of your cluster. All this information combined in a single screen, no need to navigate through multiple screens and correlate particular metrics. Correlation of metrics Correlation of particular settings and understanding the impact each setting has on a complex environment, such as a virtual infrastructure, is time consuming and above all very difficult. This correlation of metrics allows you to save time, but it also helps you understand behavior of your environment. Now you might ask how do you know you can trust if these correlations are correct and this is one of the most interesting things about this product. It’s a combination of product expertise and community driven input. The two pillars of knowledge The CloudPhysics team comprises of industry heavy hitters. Some of these persons invented core features of the vSphere stack while working for VMware, while others made their mark at other industry leading companies. The second pillar is the community involvement. In this beta program, registered users can suggest ideas for utility cards. Domain experts will verify the community provided cards on technical accuracy. Near-future developments One thing I’m very exited about is the upcoming High Availability and DRS simulation tools. Both HA and DRS can be a challenge to configure as some settings impact the virtual infrastructure on multiple levels. The HA and DRS simulation analyzes current settings and provides you a platform where you can predict the effects of a change on your environment. VMworld Challenge 2012 Now back to the current status. CloudPhysics is running a VMworld Challenge 2012. The contest allows you to describe the problems you are facing, such as “I’m applying different disk shares in my environment but I cannot see the worst case scenario allocation”. The more card you produce, the more points you score. To increase your score, download the Observer Appliance and take your environment for a test drive. The more activity you generate, the more points you accumulate. How will you benefit from this contest, first of all, if you are located in the U.S. you can win some great prices. (Due to U.S legislation, non-U.S. residents are excluded from winning prizes), but by submitting cards you improve the system and the quality of the reporting tool and simulation tool. Resource Management as a Service I started of with a disclaimer, I am a technical advisor to CloudPhysics and you can expect to see more articles about the development of CloudPhysics. As I’m able to work with the inventors of DRS and Storage DRS, a lot of my focus is on resource management. Together with the input of the community, the continuous analysis by domain experts you can expect that this might well turn out as Resource Management as a Service.

VM TEMPLATES AND STORAGE DRS

Please note that Storage DRS cannot move VM templates via storage vMotion. This can impact load balancing operations or datastore maintenance mode operations. When initiating Datastore Maintenance mode, the following message is displayed: As maintenance mode is commonly used for array migrations of datastore upgrade operations (VMFS-3 to VMFS-5), remember to convert the VM template to a virtual machine first before initiating maintenance mode.

MY PUBLIC VMWORLD SCHEDULE

This year will be an action-packed VMWorld for me, presenting sessions, participating in two panel sessions, hosting a group discussion and available in two “Meet the expert” sessions. Presenting the following sessions: INF-STO1545 - Architecting Storage DRS Datastore Clusters INF-VSP1683 - VMware vSphere Cluster Resource Pools Best Practices Panel sessions: (TAM Day) - ASK THE EXPERTS INF-VSP1504 - Ask the Expert vBloggers Hosting the GD22 - Resource management (DRS/SDRS) group discussion. I invited Anne Holler (Lead engineer DRS) to host this session together with me. During Meet the Experts session 13 and session 17 I’m available for short meetings to answer your resource management (DRS\SDRS) questions. Here is the week schedule of the sessions/events/activities that I will be taking part of, be sure to sign up if you have not already: Sunday (TAM Day): 14:35 – 15:35 : ASK THE EXPERTS Monday: 14:30 – 15:30 : INF-VSP1504 - Ask the Expert vBloggers 16:00 – 17:00 : GD22 – Resource Management Tuesday: 12:30 – 13:30 : INF-STO1545 - Architecting Storage DRS Datastore Clusters (Repeat session) 15:00 – 16:00 : INF-VSP1683 - vSphere Cluster Resource Pools Best Practices Wednesday: 08:30 – 09:30 : INF-STO1545 - Architecting Storage DRS Datastore Clusters 12:30 – 13:30 : Expert 13 Thursday: 12:00 – 12:00 : Expert 17

DRS AND MEMORY BALANCING IN NON-OVERCOMITTED CLUSTERS

First things first, I normally do not recommend changing advanced settings. Always try to tune system behavior by changing the settings provided by the user interface or try to understand system behavior and how it aligns with your design. The “problem” DRS load balancing recommendations could be sub-optimal when no memory overcommitment is preferred. Some customers prefer not to use memory overcommitment. The clusters contain (just) enough memory capacity to ensure all running virtual machines have their memory backed by physical memory. Nowadays it is not uncommon seeing virtual machines with fairly highly allocated (consumed) memory and due to the use of large pages on hosts with recent CPU architectures, little to no memory is shared. Common scenario with this design is a usual host memory load of 80-85% consumed. In this situation, DRS recommendations may have a detrimental effect on performance as DRS does not consider consumed memory but active memory. DRS behavior When analyzing the requirements of a virtual machine during load balancing operations, DRS calculates the memory demand of the virtual machine. The main memory metric used by DRS to determine the memory demand is memory active. The active memory represents the working set of the virtual machine, which signifies the number of active pages in RAM. By using the working-set estimation, the memory scheduler determines which of the allocated memory pages are actively used by the virtual machine and which allocated pages are idle. To accommodate a sudden rapid increase of the working set, 25% of idle consumed memory is allowed. Memory demand also includes the virtual machine’s memory overhead. Let’s use an 8 GB virtual machine as example on how DRS calculates the memory demand. The guest OS running in this virtual machine has touched 50% of its memory size since it was booted but only 20% of its memory size is active. This means that the virtual machine has consumed 4096 MB and 1639.2 MB is active. As mentioned, DRS accommodate a percentage of the idle consumed memory to accommodate a sudden increase of memory use. To calculate the idle consumed memory, the active memory 1639.2 MB is subtracted from the consumed memory, 4096 MB, resulting in a total 2456.8 MB. By default DRS includes 25% of the idle consumed memory, i.e. 614.2 MB. The virtual machine has a memory overhead of 90 MB. The memory demand DRS uses in it’s load balancing calculation is as follows: 1639.2 MB + 614.2 MB + 90 MB = 2343.4 MB. This means that DRS will select a host that has 2343.4 MB available for this machine and the move to this host improves the load balance of the cluster. DRS and corner stone of virtualization resource overcommitment Resource sharing and overcommitment of resources are primary elements of the virtualization. When designing virtual infrastructure it is a challenge to build the environment in such a way that it can handle virtual machine workloads while improving server utilization. Because every workload is not equal, applying resource allocation settings such as shares, reservations and limits can make distinction in priority. DRS is designed with this corner stone in mind. And that’s makes DRS sometimes a hard act to follow. DRS is all about solving imbalance and providing enough resources to the virtual machines aligned to their demand. This means that DRS balances workload on demand and trust in its core value that overcommitment is allowed. It then relies on the host local scheduler to figure out the priority of the virtual machines. And this behavior is sometimes not in line with the perception of DRS. A common perception is that DRS is about optimizing performance. This is partially true. As mentioned before DRS looks at the demand of the VM, and will try to mix and match activity of the virtual machines with the available resources in the cluster. As it relies on resource allocation settings, it assumes that priority is defined for each virtual machine and that the host local schedulers can reclaim memory safely. For this reason the DRS memory imbalance metric is tuned to focus on VM active memory to allow efficient sharing of host memory resources. Allowing to run with less cluster memory than the sum of all running virtual machine memory sizes and reclaiming idle consumed memory from lower priority virtual machines for other virtual machines’ active workloads. Unfortunately DRS does not know when the environment is designed in such a way to avoid overcommitment. Based on the input it can place a virtual machine on a host with virtual machine that have lots of idle consumed memory laying around. Instigating memory reclamation. In most cases this reclamation is hardly noticeable due to the use of the balloon driver. However in the case where all hosts are highly utilized, ballooning might not be as responsive as required, forcing the kernel to compress memory and swap. This means that migrations for the sole purpose of balancing active memory are not useful in environments like these and, if the target host memory is highly consumed, can cause a performance impact on the migrating virtual machine as it waits to obtain memory and on the other virtual machines on the target host as they do processing to allow reclamation of their idle memory. The solution? You might want to change the 25% idle consumed memory setting The solution I recommend to start with is to lower the migration threshold by moving the slider to the left. This allows the DRS cluster to have an higher imbalance and allows DRS to be more conservative when recommending migrations. If this is not satisfactory, then I would suggest changing the DRS advanced option called IdleTax. Please note that this DRS advanced option is not the same setting as the memory kernel setting. Mem.IdleTax. The DRS IdleTax advanced option (default 75) controls how much consumed idle memory should be added to active memory in estimating memory demand. The calculation is as follows: 100-IdleTax. Default caluculation = 100-75=25 This means that the smaller the value of IdleTax, more consumed idle memory is added to the active memory by DRS for load balancing. Be aware that the value of IdleTax is a heuristic, tuned to facilitate memory overcommitment; tuning it to a lower value is appropriate for environments not using overcommitment. Note that the option is set per cluster, and would need to be changed for all DRS clusters as appropriate. Again, try to use a lower migration threshold setting and monitor if this setting provides satisfying results before setting this advanced feature.