• Skip to primary navigation
  • Skip to main content

frankdenneman.nl

  • AI/ML
  • NUMA
  • About Me
  • Privacy Policy

Thin or thick disks? – it’s about management not performance

December 19, 2012 by frankdenneman

This is my contribution to the debate Zero or Thick disks – debunking the performance myth.
The last couple of years all sorts of VMware engineers worked very hard to reduce the performance difference between thin disks and thick disks. Many white-papers have been written by performance engineers to explain the improvements made on thin-disk. Therefore today the question whether to use Thin-provisioned disks or Eager zero thick is not about the difference in performance but the difference in management.
When using Thin-provisioned VMDKs you need to have a very clear defined process. What to do, when your datastore, which stores the thin provisioned disks is getting full? You need to define a consolidation ratio, you need to understand which operational process might be dangerous to your environment (think Patch-Tuesday) and what space utilization threshold you need to define before migrating thin-provisioned disks to other datastores.
Today Storage DRS can help you with many of the fore mentioned challenges. For more information please read the article: Avoiding VMDK level over-commitment while using Thin-provisioned disks and Storage DRS.
If Storage DRS is not used, Thin-provisioned disks can require a seamless collaboration between virtualization teams (provisioning and architecture) and storage administrators. When this is not possible due to organizational cultural differences, thin provisioning is rather a risk, than bliss.
Zero out process: Eager zero thick on the other hand might provide in some (corner) cases a marginal performance increase; the costs involved could outweigh the perceived benefits. First of all, Eager zero thick disks need to be zeroed out during creation, when your array doesn’t support the VAAI initiatives, this can take a hit on performance and the time to provision is extended. With terabyte sized disks becoming more common this will impact provisioning time immensely.
Waste of space: Most virtualized environments use virtual machines, typically configured with oversized OS disks and over-specced data disks, resulting in wasted space full of zero’s. Thin-provisioned disks only occupy the space used for storing data, not zero’s.
Migration: Storage vMotion goes out of its way to migrate every little bit of a virtual disk, this means it needs to copy over every zeroed out block. Combined with the oversized disks, you are creating unnecessary overhead on your hosts and storage subsystem copying and verifying the integrity of zeroed out blocks. Migrating thin disks only requires migrating the “user-data”, resulting in faster migration times, lesser overhead on hosts and storage subsystem.
In essence, Thin-provisioned disks versus Eager zero thick is all about resource/time saving versus risk avoidance. Choose wisely

Filed Under: Storage DRS, VMware

vMotion Futures Survey

December 19, 2012 by frankdenneman

If you have a few spare minutes, please fill out the survey about vMotion futures. We are very interested in how you use vMotion and especially your opinion about use cases for Long-distance vMotion operations. It takes about 10 minutes to get through.
http://tinyurl.com/VMwareVMotion2012

Filed Under: vMotion

Storage vMotion and the vSphere web-client

December 18, 2012 by frankdenneman

The new web client of vSphere 5.1 is my weapon of choice when working in my lab. It contains a lot of “hidden” gems, the UI team spends a lot of time crafting and aligning the user-interface to the administrator needs. One thing that drove me nuts was the lack of information when running a Storage vMotion operation. The Recent task doesn’t show anything, other than Storage vMotion operation itself and the target. When using Storage DRS, it only shows the name of the target Datastore cluster. Sometimes you just want to know which datastore the virtual machine migrated to.
The new recent task window
For example, take a look at the Recent Tasks window in the right size of the corner of the web client. When running a storage vMotion operation, it displays the Storage vMotion task similar to the vSphere client. However, when clicking on the task itself it shows the event info and task in one view. Helping you identify the source and destination datastore of the Storage vMotion process.

Compare that to the workflow of the old vSphere client
Select the task in the Recent task bar and double click it.

This brings you in the Task and Events view of the target datastore cluster. By default the view is displaying events. Therefore you need to select Tasks, then select the task itself and then click on the related events in the bottom view.

It may look trivial, but these things do speed up your work. Eradicating unnecessary clicks and wait time before a screen refreshes sure makes your job a little easier.

Filed Under: Storage DRS, vMotion

Designing your vMotion network

December 18, 2012 by frankdenneman

A well designed vMotion network will benefit the environment in many ways. Before vSphere 5, designing a vMotion network was relative easy, select the fastest NIC and assign it to a vMotion vmknic. vSphere 4.x supports both 1GB and 10GB networks and since vSphere 5.0 vMotion is able to leverage multiple NICs. Multi-NIC in vSphere 5.x makes it a bit more challenging. The combination of NICs, the failover mode and which load balancing policy need to be taken into consideration when configuring your vMotion network.
The benefit of multi-NIC vMotion network
In vSphere 5.x vMotion balances the vMotion operations across all available NICs. Both for a single vMotion operation and for multiple concurrent vMotions operations. By using multiple NICs it reduces the duration of a vMotion operation. This benefits:

  • Manual vMotion processes: Allocating more bandwidth to the vMotion process will result in faster migration times. The less time spend on monitoring a manual process means more time you can spend on other – more important – operations.
  • DRS load balancing: Based on the average time per migration and the number of concurrent vMotion process, DRS restrict the number of load balancing operations it can process between two load balancing runs. By providing more bandwidth to the vMotion network, DRS is able to run more load balancing operations in between load balancing runs, which in turn benefits the load balance of the cluster. Better load balance means better resource availability for the virtual machines, which in turn means better performance for your applications.
  • Maintenance mode: Faster migration times, means reducing the time a host enters maintenance mode. With the increase of consolidation ratio, the time it takes to migrate all the virtual machines off the host is increased as well. This can have impact on your SLA and the ability to service the host within the allowed time.

Multi-NIC setup
The vMotion process leverages the vmknic instead of sending packets directly to a physical NIC.In order to use multiple NICs for vMotion, multiple vmknics are required. Duncan wrote an excellent article how to setup multi-NIC vMotion network.
Failover order and Load balancing policy
Each vmknic should only use one active NIC, this result in only one valid load balancing policy and that is “Route based on originating virtual port”. This setup inhibits physical NIC load balancing such as IP-hash or the load balancing policy “based on physical load (LBT)”. The reason why the active/standby failover order is the only valid and supported configuration is because of the way vMotion load balancing works. The vMotion process itself handles load balancing, based on its own algorithm, vMotion picks a vmknic for a specific network packet. As vMotion expects the vmknic to be backed by a single physical NIC, sending out data through a given vmknic ensures vMotion that the data traverses that dedicated physical NIC. If the physical NICs were configured in a load-balancing mode, this could interfere with the vMotion level load balancing logic. vMotion would not be able to predict which physical NIC is used by the vmknic and possible sending all vMotion traffic over the same NIC, even if Motion sends the data to different vmknics.

Consistent configuration
Ensure that each physical NIC is configured in a consistent and correct way across the vMotion network on the host and across the hosts inside the cluster. vCenter is very conservative and if it detects a mismatch it drops the number of concurrent vMotion operations on that host back to 2 regardless of the available bandwidth. Therefore always check on each level if the link speed, MTU and duplex settings are identical, both on the NIC side as well as the switch side. Please keep in mind that the all the vMotion vmknics should exist in the same subnet. Although its possible to set static routes, “routable vMotion” configurations are not supported by VMware.
Link speed
By default vCenter allows 4 concurrent vMotion operations on a host with a 1 GB vMotion network and 8 concurrent vMotion operations on a host with a 10 GB vMotion network. Be aware that this is based on the detected link-speed. In other words, if the physical NIC reports at least 10GbE, link speed, vCenter allows 8 vMotions, but if the physical NIC reports less than 10GBe, vCenter allows a maximum of 4 concurrent vMotions on that host. To stress it again, the number of concurrent vMotions is based on the detected link speed. For example, take the HP Flex technology. This sets a hard limit on the flexnics, resulting in the reported link speed equal or less to the configured bandwidth on Flex virtual connect level. I’ve come across many Flex environments configured with more than 1GB bandwidth, ranging between 2GB to 8GB. Although they will offer more bandwidth per vMotion process, it will not offer an increase in the number of concurrent vMotions, limiting the number of concurrent vMotion operations to 4.
As mentioned before, vCenter is very conservative; multi-NIC vMotion limits are currently determined by the slowest available vMotion NIC. This means that if you include a 1GB NIC in your 10GB vMotion network configuration, that host is restricted to maximum of 4 concurrent vMotion operations per host. I’ve seen a few vMotion network designs where a 1GB link was added to the 10GB vMotion network configuration, primarily used as a safety net. Just in case the 10GB network drops, well that safety net just restricted that host to 4 concurrent vMotion operations.
Increase in NICS does not impact number of concurrent vMotions
Using multiple NICs increases the bandwidth available for the vMotion process; it does not increase the number of concurrent vMotions. When 10Gb uplinks are used for the vMotion, the maximum concurrent vMotions allowed is eight, even with multiple NICs are assigned, the limit remains at eight. However the increase in bandwidth will decrease the duration of each vMotion process.
A Multi-NIC vMotion configuration is slightly more complex that single NIC vMotion networks, but this setup will benefit you in many ways. Reducing vMotion operation times allows DRS to schedule more load balancing operations per invocation and the increased bandwidth allows the host to complete the transition to maintenance mode faster. Multi-VM is very helpful when leveraging the new vMotion possibility by migrating the virtual machine between hosts and datastores simultaneously.
Part 2 – Multi-NIC vMotion failover order configuration
Part 3 – Multi-NIC vMotion and NetIOC
Part 4 – Choose link aggregation over Multi-NIC vMotion?
Part 5 – 3 reasons why I use a distributed switch for vMotion networks

Filed Under: vMotion

SIOC on datastores backed by a single datapool

December 6, 2012 by frankdenneman

Duncan posted an article today in which he brings up the question: Should I use many small LUNs or a couple large LUNs for Storage DRS? In this article he explains the differences between Storage I/O Control (SIOC) and Storage DRS and why they work well together, to re-emphasize, the goal of Storage DRS load balancing is to fix long term I/O imbalances, while SIOC addresses short term burst and loads. SIOC is all about managing the queue’s while Storage DRS is all about intelligent placement and avoiding bottlenecks.
Julian Wood makes an interesting remark, and both Duncan and I hear this remark when discussing SIOC. Don’t get me wrong I’m not picking on Julian, I’m merely stating the fact he made a frequently used argument.

“There is far less benefit in using Storage IO Control to load balance IO across LUNs ultimately backed by the same physical disks than load balancing across separate physical storage pools. “

Well when you look at the way SIOC works I tend to disagree with this statement. As stated before, SIOC manages queues, queues to the datastores used by the virtual machines in the virtual datacenter. Typically speaking these virtual machines differ from workload types, from peak moments and also they differ in importance to the organization. With the use of disk shares, important virtual machine can be assigned a higher priority within the disk queue. When contention occurs, and this is important to realize, when contention occurs these business critical virtual machine get prioritized over other virtual machines. Not all important virtual machines generate a constant stream of I/O, while other virtual machines, maybe with a lower priority do generate a constant stream of IO. The disk shares provide the high priority low IO virtual machines to get a foot between the door and get those I/Os to the datastore and back. Without SIOC and disk shares you need to start thinking of increasing the queue depth of each hosts and think about smart placement of these virtual machines (both high and low I/O load) to avoid those high I/O load getting on the same host. These placement adjustment might impact DRS load balancing operations, possibly affecting other virtual machines along the way. Investing time in creating and managing a matrix of possible vm to datastore placement is not the way to go in this time with rapidly expanding datacenters.
Because SIOC is a datastore-wide scheduler, SIOC determines the queue-depth of the ESX hosts connected to the datastores running virtual machines on those datastores. Hosts with higher priority virtual machines get “deeper” queue depths to the datastore and hosts with lower priority virtual machines running on the datastore receive shorter queue-depths. To be more precise, SIOC calculates the datastore wide latency and each local host scheduler determines the queue depth for the queues of the datastore.
But remember queue depth changes only occur when there is contention, when the datastore exceeds the SIOC latency threshold. For more info about SIOC latency read “To which Host level latency statistic is the SIOC threshold related”
Coming back to the argument, I firmly believe that SIOC has benefits in a shared diskpool structure, between the VMM and the datastore a lot of queue’s exists.
vSphere 5.1 VMObservedLatency
Because SIOC takes the avg device latency off all hosts connected to the datastore into account, it understands the overall picture when determining the correct queue depth for the virtual machines. Keep in mind, queue depth changes occur only during contention. Now the best part of SIOC in 5.1 is that it has the Automatic Latency Threshold Computation. By leveraging the SIOC injector it understands the peak value of a datastore and adjust the SIOC threshold. The SIOC threshold will be set to 90% of its peak value, therefor having an excellent understanding of the performance capability of the datastore. This is done on a regular basis so it keeps actual workload in mind. This dynamic system will give you far more performance benefit that statically setting the queue-depth and DNSRO for each host.
One of the main reasons of creating multiple datastores that are backed by a single datapool is because of creating a multi-path environment. Together with advanced multi-pathing policies and LUN to controller port mappings, you can get the most out of your storage subsystem. With SIOC, you can manage your queue depths dynamically and automatically, by understanding actually performance levels, while having the ability to prioritize on virtual machine level.

Filed Under: SIOC Tagged With: DRS, SIOC, Storage DRS

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 52
  • Page 53
  • Page 54
  • Page 55
  • Page 56
  • Interim pages omitted …
  • Page 89
  • Go to Next Page »

Copyright © 2026 · SquareOne Theme on Genesis Framework · WordPress · Log in