Category: SIOC (page 1 of 2)

How to enable SIOC stats only mode?

Today on twitter, David Chadwick, Cormac Hogan and I were discussing SIOC stats only mode. SIOC stats only mode gathers statistics to provide you insights on the I/O utilization of the datastore. Please note that Stats only mode does not enable the datastore-wide scheduler and will not enforce throttling. Stats only mode is disabled due to the (significant) increase of log data into the vCenter database.

SIOC stats only mode is available from vSphere 5.1 and can be enabled via the web client. To enable SIOC stats only mode go to:

  1. Storage view
  2. Select the datastore
  3. Select Manage
  4. Select Settings

01-SIOC-disabled

By default both SIOC and SIOC stats only mode is disabled. Click on the edit button at the right side of the screen. Un-tick the check box “Disable Storage I/O statistics collection (applicable only if Storage I/O Control is disabled)”. Click on OK

02-enable-SIOC-stats-only-mode

To test to see if there is any difference, I used a datastore that SIOC had enabled. I disabled SIOC and un-ticked the “Disable Storage I/O statistics collection (applicable only if Storage I/O Control is disabled)” option. I opened up the performance view and selected the “Realtime” Time Range.

  1. Storage view
  2. Select the datastore
  3. Select Monitor
  4. Select Performance
  5. Select “Realtime” Time range

03-SIOC-Time-Range

At 15:35 I disabled SIOC, which explains the dip, at 15:36 SIOC stats only mode was enabled and it took vCenter roughly a minute to start displaying the stats again.

04-running stats only mode

As all new vSphere 5.1 features, SIOC stats only mode can only be enabled via the vSphere web client.

Error -1 in opening & reading the slot file error in storageRM.log (SIOC)

The problem
Recently I noticed that my datastore cluster was not providing Latency statistics during initial placement. The datastore recommendation during initial placement displayed space utilization statistics, but displayed 0 in the I/O Latency Before column

01-Datastore-recommendations

The performance statistics of my datastores showed that there was I/O activity on the datastores.

02-Datastore-latency

However the SIOC statistics all showed no I/O activity on the datastore

03-SIOC-activity

The SIOC log file (storagerm.log) showed the following error:

Open /vmfs/volumes/ /.iorm.sf/slotsfile (0x10000042, 0x0) failed: permission denied
Giving UP Permission denied Error -1 opening SLOT file /vmfs/volumes/datastore/.iorm.sf/slotsfile
Error -1 in opening & reading the slot file
Couldn’t get a slot
Successfully closed file 6
Error in opening stat file for device: datastore. Ignoring this device

The following permissions were applied on the slotfile:

04-slotsfile-before

The Solution
Engineering explained to me that these permissions were not the default standards and default permissions are read and execute access for everyone and write access for the owner of the file. The following command sets the correct permissions on the slotsfile:

Chmod 755 /vmfs/volumes/datastore/.iorm.sf/slotsfile

Checking the permission shows that the permissions are applied:

05-slotsfile-after

The SIOC statistics started to show the I/O activity on the datastore

06-SIOC-activity-after

Before changing the permissions on the slotsfile I stopped the SIOC service on the host by entering the command: /etc/init.d/storageRM stop
However I believe this isn’t necessary. Changing the permissions without stopping SIOC on the host should work.

Cause
We are not sure what causes this problem and support and engineering are troubleshooting this error. In my case I believe it has to do with the frequent restructuring of my lab. vCenter and ESXi servers are reinstalled regularly, but I have never reformatted my datastores. I do not expect to see this error appear in stable production environments. Please check the current permissions on the slotsfile if Storage DRS does not show I/O utilization on the datastore. (VMs must be running and I/O metric on the Datastore cluster must be enabled of course)

I expect the knowledge base article to be available soon.

Error -1 in opening & reading the slot file error in storageRM.log (SIOC)

The problem
Recently I noticed that my datastore cluster was not providing Latency statistics during initial placement. The datastore recommendation during initial placement displayed space utilization statistics, but displayed 0 in the I/O Latency Before column

01-Datastore-recommendations

The performance statistics of my datastores showed that there was I/O activity on the datastores.

02-Datastore-latency

However the SIOC statistics all showed no I/O activity on the datastore

03-SIOC-activity

The SIOC log file (storagerm.log) showed the following error:

Open /vmfs/volumes/ /.iorm.sf/slotsfile (0x10000042, 0x0) failed: permission denied
Giving UP Permission denied Error -1 opening SLOT file /vmfs/volumes/datastore/.iorm.sf/slotsfile
Error -1 in opening & reading the slot file
Couldn’t get a slot
Successfully closed file 6
Error in opening stat file for device: datastore. Ignoring this device

The following permissions were applied on the slotfile:

04-slotsfile-before

The Solution
Engineering explained to me that these permissions were not the default standards and default permissions are read and execute access for everyone and write access for the owner of the file. The following command sets the correct permissions on the slotsfile:

Chmod 755 /vmfs/volumes/datastore/.iorm.sf/slotsfile

Checking the permission shows that the permissions are applied:

05-slotsfile-after

The SIOC statistics started to show the I/O activity on the datastore

06-SIOC-activity-after

Before changing the permissions on the slotsfile I stopped the SIOC service on the host by entering the command: /etc/init.d/storageRM stop
However I believe this isn’t necessary. Changing the permissions without stopping SIOC on the host should work.

Cause
We are not sure what causes this problem and support and engineering are troubleshooting this error. In my case I believe it has to do with the frequent restructuring of my lab. vCenter and ESXi servers are reinstalled regularly, but I have never reformatted my datastores. I do not expect to see this error appear in stable production environments. Please check the current permissions on the slotsfile if Storage DRS does not show I/O utilization on the datastore. (VMs must be running and I/O metric on the Datastore cluster must be enabled of course)

I expect the knowledge base article to be available soon.

SIOC on datastores backed by a single datapool

Duncan posted an article today in which he brings up the question: Should I use many small LUNs or a couple large LUNs for Storage DRS? In this article he explains the differences between Storage I/O Control (SIOC) and Storage DRS and why they work well together, to re-emphasize, the goal of Storage DRS load balancing is to fix long term I/O imbalances, while SIOC addresses short term burst and loads. SIOC is all about managing the queue’s while Storage DRS is all about intelligent placement and avoiding bottlenecks.

Julian Wood makes an interesting remark, and both Duncan and I hear this remark when discussing SIOC. Don’t get me wrong I’m not picking on Julian, I’m merely stating the fact he made a frequently used argument.

“There is far less benefit in using Storage IO Control to load balance IO across LUNs ultimately backed by the same physical disks than load balancing across separate physical storage pools. “

Well when you look at the way SIOC works I tend to disagree with this statement. As stated before, SIOC manages queues, queues to the datastores used by the virtual machines in the virtual datacenter. Typically speaking these virtual machines differ from workload types, from peak moments and also they differ in importance to the organization. With the use of disk shares, important virtual machine can be assigned a higher priority within the disk queue. When contention occurs, and this is important to realize, when contention occurs these business critical virtual machine get prioritized over other virtual machines. Not all important virtual machines generate a constant stream of I/O, while other virtual machines, maybe with a lower priority do generate a constant stream of IO. The disk shares provide the high priority low IO virtual machines to get a foot between the door and get those I/Os to the datastore and back. Without SIOC and disk shares you need to start thinking of increasing the queue depth of each hosts and think about smart placement of these virtual machines (both high and low I/O load) to avoid those high I/O load getting on the same host. These placement adjustment might impact DRS load balancing operations, possibly affecting other virtual machines along the way. Investing time in creating and managing a matrix of possible vm to datastore placement is not the way to go in this time with rapidly expanding datacenters.

Because SIOC is a datastore-wide scheduler, SIOC determines the queue-depth of the ESX hosts connected to the datastores running virtual machines on those datastores. Hosts with higher priority virtual machines get “deeper” queue depths to the datastore and hosts with lower priority virtual machines running on the datastore receive shorter queue-depths. To be more precise, SIOC calculates the datastore wide latency and each local host scheduler determines the queue depth for the queues of the datastore.

But remember queue depth changes only occur when there is contention, when the datastore exceeds the SIOC latency threshold. For more info about SIOC latency read “To which Host level latency statistic is the SIOC threshold related

Coming back to the argument, I firmly believe that SIOC has benefits in a shared diskpool structure, between the VMM and the datastore a lot of queue’s exists.

vSphere 5.1 VMObservedLatency

Because SIOC takes the avg device latency off all hosts connected to the datastore into account, it understands the overall picture when determining the correct queue depth for the virtual machines. Keep in mind, queue depth changes occur only during contention. Now the best part of SIOC in 5.1 is that it has the Automatic Latency Threshold Computation. By leveraging the SIOC injector it understands the peak value of a datastore and adjust the SIOC threshold. The SIOC threshold will be set to 90% of its peak value, therefor having an excellent understanding of the performance capability of the datastore. This is done on a regular basis so it keeps actual workload in mind. This dynamic system will give you far more performance benefit that statically setting the queue-depth and DNSRO for each host.

One of the main reasons of creating multiple datastores that are backed by a single datapool is because of creating a multi-path environment. Together with advanced multi-pathing policies and LUN to controller port mappings, you can get the most out of your storage subsystem. With SIOC, you can manage your queue depths dynamically and automatically, by understanding actually performance levels, while having the ability to prioritize on virtual machine level.

vSphere 5.1 Storage DRS load balancing and SIOC threshold enhancements

Lately I have been receiving questions on best practices and considerations for aligning the Storage DRS latency and Storage IO Control (SIOC) latency, how they are correlated and how to configure them to work optimally together. Let’s start with identifying the purpose of each setting, review the enhancements vSphere 5.1 has introduced and discover the impact when misaligning both thresholds in vSphere 5.0.

Purpose of the SIOC threshold
The main goal of the SIOC latency is to give fair access to the datastores, throttling virtual machine outstanding I/O to the datastores across multiple hosts to keep the measured latency below the threshold.

SIOC threshold violation host queue throttled

It can have a restrictive effect on the I/O flow of virtual machines.

Purpose of the Storage DRS latency threshold
The Storage DRS latency is a threshold to trigger virtual machine migrations. To be more precise, if the average latency (VMObservedLatency) of the virtual machines on a particular datastore is higher than the Storage DRS threshold, then Storage DRS will mark that datastore as “source” for load balancing migrations. In other words, that datastore provide the candidates (virtual machines) for Storage DRS to move around to solve the imbalance of the datastore cluster.

Storage DRS Load balancing between source and destination datastores

This means that the Storage DRS threshold metric has no “restrictive” access limitations. It does not limit the ability of the virtual machines to send I/O to the datastore. It is just an indicator for Storage DRS which datastore to pick for load balance operations.

SIOC throttling behavior
When the average device latency detected by SIOC is above the threshold, SIOC throttles the outstanding IO of the virtual machines on the hosts connected to that datastore. However due to different number of shares, various IO sizes, random versus sequential workload and the spatial locality of the changed blocks on the array, we are almost certain that no virtual machine will experience the same performance. Some virtual machines will experience a higher latency than other virtual machines running on that datastore. Remember SIOC is driven by shares, not reservations, we cannot guarantee IO slots (reservations). Long story short, when the datastore is experiencing latency, the VMkernel manages the outbound queue, resulting in creating a buildup of I/O somewhere higher up in the stack. As the SIOC latency threshold is the weighted average of D/AVG per host, the weight is the number of IOPS on that host. For more information how SIOC calculates the Device average latency, please read the article: “To which host-level latency statistic is the SIOC congestion threshold related?

Has SIOC throttling any effect on Storage DRS load balancing?
Depending on which vSphere version you run is the key whether SIOC throttling has impact on Storage DRS load balancing. As stated in the previous paragraph, if SIOC throttles the queues, the virtual machine I/O does not disappear, vSphere always allows the virtual machine to generate I/O to the datastore, it just builds up somewhere higher in the stack between the virtual machine and the HBA queue.

In vSphere 5.0, Storage DRS measures latency by averaging the device latency of the hosts running VMs on that datastore. This is almost the same metric as the SIOC latency. This means that when you set the SIOC latency equal to the Storage DRS latency, the latency will be build up in the stack above the Storage DRS measure point. This means that in worst-case scenario, SIOC throttles the I/O, keeping it above the measure point of Storage DRS, which in turn makes the latency invisible to Storage DRS and therefore does not trigger the load balance operation for that datastore.

Introducing vSphere 5.1 VMObservedLatency
To avoid this scenario Storage DRS in vSphere 5.1 is using the metric VMObservedLatency. This metric measures the round-trip of I/O from the moment the VMkernel receives the I/O (Virtual Machine Monitor) to the datastore and all the way back to the VMM.

vSphere 5.1 VMObservedLatency

This means that when you set the SIOC latency to a lower threshold than the Storage DRS latency, Storage DRS still observes the latency build up in the kernel layer.

vSphere 5.1 Automatic latency SIOC
To help you avoid building up I/O in the host queue, vSphere 5.1 offers automatic threshold computation for SIOC. SIOC sets the latency to 90% of the throughput level of the device. To determine this, SIOC derives a latency setting after a series of tests, mapping maximum throughput to a latency value. During the tests SIOC detects where the throughput of I/O levels out, while the latency keeps on increasing. To be conservative, SIOC derives a latency value that allows the host to generate up to 90% of the throughput, leaving a burst space of 10%. This provides the best performance of the devices, avoiding unnecessary restrictions by building up latency in the queues. In my opinion, this feature alone warrants the upgrade to vSphere 5.1.

How to set the two thresholds to work optimally together
SIOC in vSphere 5.1 allows the host to go up to 90% of the throughput before adjusting the queue length of each host, and generate queuing in the kernel instead of queuing on the storage array. As Storage DRS uses VMObservedLatency it monitors the complete stack. It observes the overall latency, disregarding the location of the latency in the stack and tries to move VMs to other datastores to level out the overall experienced latency in the datastore cluster. Therefore you do not need to worry about misaligning the SIOC latency and the Storage DRS I/O latency.

If you are running vSphere 5.0 it’s recommended setting the SIOC threshold to a higher value than the Storage DRS I/O latency threshold. Please refer to your storage vendor to receive the accurate SIOC latency threshold.

Get notification of these blogs postings and more DRS and Storage DRS information by following me on Twitter: @frankdenneman

Older posts

© 2018 frankdenneman.nl

Theme by Anders NorenUp ↑