<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>frankdenneman.nl &#187; Storage</title>
	<atom:link href="http://frankdenneman.nl/category/storage/feed/" rel="self" type="application/rss+xml" />
	<link>http://frankdenneman.nl</link>
	<description></description>
	<lastBuildDate>Tue, 24 Jan 2012 15:49:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>ESX4 ALUA, TPGS and HP CA</title>
		<link>http://frankdenneman.nl/2010/03/esx4-alua-and-hp-continuous-access/</link>
		<comments>http://frankdenneman.nl/2010/03/esx4-alua-and-hp-continuous-access/#comments</comments>
		<pubDate>Thu, 25 Mar 2010 09:30:41 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[ALUA]]></category>
		<category><![CDATA[Continuous Access]]></category>
		<category><![CDATA[ESX4]]></category>
		<category><![CDATA[tpgs]]></category>
		<category><![CDATA[VMware]]></category>

		<guid isPermaLink="false">http://frankdenneman.nl/?p=824</guid>
		<description><![CDATA[In my blog post: &#8220;HP CA and the use of LUN balancing scripts&#8221; I tried to cover the possible impact of using HP continuous Access EVA on the LUN path load balancing scheme in ESX 3.x. I received a lot of questions about this and wanted to address some issues again and try to clarify [...]<p><a href="http://frankdenneman.nl/2010/03/esx4-alua-and-hp-continuous-access/">ESX4 ALUA, TPGS and HP CA</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>In my blog post: &#8220;<a href="http://frankdenneman.nl/2009/02/hp-continuous-access-and-the-use-of-lun-balancing-scripts/">HP CA and the use of LUN balancing scripts</a>&#8221; I tried to cover the possible impact of using HP continuous Access EVA on the LUN path load balancing scheme in ESX 3.x. I received a lot of questions about this and wanted to address some issues again and try to clarify them. Let&#8217;s begin with a recap of the HP CA article; The impact of CA on the load-balancing scheme is due to the fact that an EVA is an asymmetric Active-Active array that uses the Asymmetric Logical Unit Access protocol (ALUA). ESX3 is not ALUA aware and does not recognize the different specific access characteristics of the array&#8217;s target ports. VMware addressed this shortcoming and added ALUA support in the new storage stack of ESX4. The ALUA support is a great feature of the new storage architecture, it reduces a lot of extra manual steps of creating a proper load-balanced environment. But how exactly does ALUA identifies which path is optimized and will HP Continuous Access still have an impact on ESX4 environments as well?<br />
<span id="more-824"></span><br />
<strong>Asymmetric Logical Unit Access</strong><br />
Asymmetric Logical Unit Access occurs when the access characteristics of a storage processor port relative to the LUN differs from another port in the array. This behavior occurs on Asymmetrical Active-Active arrays (AAA). Two well-know AAA arrays are the EMC CX and the HP EVA. In a AAA array both controllers can receive IO commands (active-active), but only one controller can issue IO to the LUN. This is the asymmetrical part.  The opposite of a AAA array is a symmetric Active-Active array, like the EMC Symmetrix DMX, such an array can issue IO command to the LUN via both controllers. But for now let&#8217;s concentrate on AAA arrays and the HP EVA specifically.</p>
<p>The controller in an AAA array who can issue commands is called the managing controller, paths to the LUN via ports of this controller are called <em>optimized paths</em>.  IO sent to a port of the non-owning controller must be transferred to the owning controller internally and increases latency and have impact on the performance of the array. Due to this, paths leading to the LUN via the non-managing controller are called <em>non-optimized paths</em>. </p>
<p><strong>Pluggable Storage Architecture</strong><br />
The default Native Multipathing Plugin (NMP) used by the storage stack in ESX4 uses two sub-plugins, the Storage Array Type Plug-in and Path Selection Plugins. Storage Array Type Plugins is used for path handling and the Path Selection Plugins is used for Path selection.<br />
It is interesting to know that the SATP is associated with all the physical paths to the array and is configured globally per array, but the PSP can be configured per LUN. </p>
<p><strong>Storage Array Type Plugins</strong><br />
A default Storage Array Type Plugins (SATP) is included for each supported array as well as a generic SATP for non-specified storage arrays. The two SATP available for the HP EVA are:</p>
<p>•	VMW_SATP_EVA 	(array specific)<br />
•	VMW_SATP_ALUA 	(generic SATP)</p>
<p>The problem with the array specific VMW_SATP_EVA is that it doesn&#8217;t use Target Port Group Support (TPGS), Funny thing is that TPGS is critical when it comes to determining optimized and non-optimized paths. Therefore the VMware HCL list the generic VMW_SATP_ALUA as the supported SATP on EVA Active-Active arrays. </p>
<p><strong>Target Port Group Support</strong><br />
ALUA uses Target Port Group Support to determine the access characteristics of the path to a LUN. A port on a storage processor is called a target port and belongs to a target port group (TPG). All target ports belonging to the same TPG are always in the same Asymmetric Access State (AAS) relative to a LUN. There are a multiple of access states for a Target port:</p>
<p>•	Active/Optimized<br />
•	Active/Non-optimized<br />
•	Standby<br />
•	Unavailable<br />
•	Transitioning</p>
<p>Note the grouping of Target Ports per TPG per Controller are vendor specific!</p>
<p><em>Active/Optimized</em><br />
A target port reporting the active/optimized state belongs to the managing owner of the LUN and communicates directly with the LUN.</p>
<p><em>Active/Non-Optimized</em><br />
A target port reporting the active/non-optimized state belongs to the non-managing owner of the LUN and the controller must send the IO via the multilink-ports to the managing controller of the LUN.</p>
<p><em>Transitioning</em><br />
If a target port is in a Transitioning state, the ownership of the LUN is being transferred between controllers, this can occur if a hardware failure happened on the managing controller or when the threshold of proxy IO&#8217;s for the specific LUN is reached, this is called an Implicit Lun Transfer (ILT)</p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2010/03/AAS.png"><img src="http://frankdenneman.nl/wp-content/uploads/2010/03/AAS.png" alt="Asymmetric Access State" title="Asymmetric Access State" width="447" height="264" class="aligncenter size-full wp-image-865" /></a></p>
<p>For example, a ESX host with two HBA&#8217;s connected to a EVA 8100, SP A is the managing controller of LUN1. TPGS will report that the Active Optimized paths are HBA1:1:1 and HBA2:1:1. HBA1:2:1 and HBA2:2:1 are listed as the non-optimized paths.</p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2010/03/optimized-paths.png"><img src="http://frankdenneman.nl/wp-content/uploads/2010/03/optimized-paths-300x281.png" alt="Optimized Paths, TPG AAS pictured" title="optimized paths" width="300" height="281" class="aligncenter size-medium wp-image-868" /></a></p>
<p><strong>Path Selection Plugin</strong><br />
It&#8217;s up to the Path Selection Plugin (PSP) to actually select the path to be used by the ESX host to communicate with the LUN. There are three PSP&#8217;s are available:</p>
<p>•	<em>Most Recently Used Path Selection</em><br />
•	<em>VMW_PSP_RR Round Robin Path Selection</em><br />
•	<em>VMW_PSP_FIXED Fixed Path Selection</em></p>
<p>The <a href="http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_san_cfg.pdf">Fibre Channel SAN Configuration Guide</a> list the following: </p>
<blockquote><p>
<strong>MRU:</strong>	Selects the path the ESX/ESXi host used most recently to access the given device.<br />
If this path becomes unavailable, the host switches to an alternative path and<br />
continues to use the new path while it is available.</p>
<p><strong>Fixed:</strong> Uses the designated preferred path, if it has been configured. Otherwise, it uses<br />
the first working path discovered at system boot time. If the host cannot use<br />
the preferred path, it selects a random alternative available path. The host<br />
automatically reverts back to the preferred path as soon as that path becomes<br />
available.</p>
<p><strong>Round Robin (RR):</strong>Uses a path selection algorithm that rotates through all available paths enabling load balancing across the paths.</p></blockquote>
<p><strong>ALUA aware PSPs</strong><br />
Both MRU and Round Robin PSPs are ALUA aware.MRU selects the first working optimized path discovered at system boot time and uses non-optimized paths when all optimized paths are dead. When an optimized path becomes available again, MRU will switch back to the optimized path. This differs from traditional MRU behavoir where a administrator must manually fail back to a path. Per default Round Robin will issue IO across all optimized paths and will use non-optimized paths only if no optimized paths are available.</p>
<p>Selecting the Fixed PSP on an ALUA enabled array defeats the purpose of ALUA support of the NMP architecture. If the preferred path is configured to use a non-optimized path, ESX will use this path to issue IO, even though optimized paths might be available. My advice is to stick with MRU or Round Robin when using a ALUA aware SATP.</p>
<p>Note! When if you have a mixed environment of ESX 3.5 and ESX 4 set the preferred paths to the LUNs according to the ALUA optimized paths listed in ESX4. If preferred paths are configured to use the non-optimized paths, repeatedly<br />
Implicit LUN transfers can occur </p>
<p>VMware list MRU as the default supported PSP in the HCL, I asked VMware Global Support Services if Round Robin is supported even if it is not listed in the HCL.</p>
<blockquote><p>
If the partner recommends RR for their certified arrays on ESX 4.0, then they provide their customers with the directions for configuring it and they support them. We will support it but not list it in the HCL</p></blockquote>
<p><strong>MRU or RR</strong><br />
So the question arises, which PSP should I select? </p>
<p>If you use <a href="http://www.vmware.com/pdf/vsphere4/r40/vsp_40_mscs.pdf">MSCS </a>configurations, using Round Robin is unsupported on the LUNs that are part of MSCS VMs. </p>
<p>By selecting MRU ESX will only use the first optimized path discovered at boot, where Round Robin uses all optimized paths. MRU will use the first optimized path and does not load balance across HBA&#8217;s. ie. HBA1->LUN1, HBA2-LUN2, HBA1->LUN3, as where Round Robin will utilize all optimized paths of both HBA&#8217;s to send IO. Almost every vendor list Round Robin as best practice when using an ALUA array, generating better load-balance and aggregation of the available uplinks.</p>
<p>Another point of interest is the IOOperationlimit, this setting specifies how much IO commands will be sent across one path before switching over to the next available optimized path. Before following best practise of your vendor, my advice is to read Chad&#8217;s post first: <a href="http://virtualgeek.typepad.com/virtual_geek/2009/12/vsphere-4-nmp-rr-iooperationslimit-bug-and-workaround.html">vSphere 4 NMP RR IOoperationsLimit bug and workaround</a>.</p>
<p><strong>Continuous Access</strong><br />
So how about the impact of Continuous Access on the path load balancing in ESX 4?<br />
When a LUN is being replicated it becomes a member of a CA Data Replication Group  (DR Group). The DR Group is a Virtual Controller Software (VCS) construct and enforces the member LUNs  to have the following behavior:</p>
<p>•	<em>Disk in DR Group will fail over together</em><br />
•	<em>To preserve write order within the DR Group</em><br />
•	<em>Share a log disk</em><br />
•	<em>Replicate to the same destination array</em></p>
<p>To meet these requirements, CA will assign one controller as managing controller for all the member LUNS in the DR Group. Because all the target ports of controller belong to the same TPG, these target ports will report to ESX as the Active/Optimized paths to the LUNs of the DR Group. This behavior will impact the design of the DR Groups, how many LUNs will become a part of one DR Group? This will impact the load across both controllers, quite easily an imbalance can occur when &#8220;DR-Group managed controller affiliation&#8221; is not taken into consideration. </p>
<p>Please be aware of the limitation of the amount of DR-Groups and the use of DR tunnels. The path between the source and destination controller is called a DR Tunnel (DRT), to ensure IO consistency CA uses only one path per DR Group. A DR group is allocated a set of write and copy resources per DR Tunnel and these resources are not infinite. When heavy traffic occurs, DR groups can temporarily run out of allocated write resources. Running out of write resources will have impact on the write performance of the virtual machine. </p>
<p>For these reasons designing a balanced DR Group is something you must do together with your SAN-administrators, because the ratio DR Groups\DR Tunnels can have impact of your virtual environment as well.</p>
<p>Please read the following great posts as well:<br />
&#8220;<a href="http://www.yellow-bricks.com/2009/09/29/whats-that-alua-exactly/">what is ALUA exactly&#8221;</a> and &#8220;<a href="http://virtualgeek.typepad.com/virtual_geek/2009/09/a-couple-important-alua-and-srm-notes.html">a couple of important (ALUA and SRM) Notes</a>&#8221;</p>
<p><a href="http://frankdenneman.nl/2010/03/esx4-alua-and-hp-continuous-access/">ESX4 ALUA, TPGS and HP CA</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=824" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2010/03/esx4-alua-and-hp-continuous-access/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Identify storage performance issues</title>
		<link>http://frankdenneman.nl/2010/03/identify-storage-performance-issues/</link>
		<comments>http://frankdenneman.nl/2010/03/identify-storage-performance-issues/#comments</comments>
		<pubDate>Thu, 25 Mar 2010 10:10:19 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[DAVG/CMD]]></category>
		<category><![CDATA[ESXtop]]></category>
		<category><![CDATA[VMware]]></category>

		<guid isPermaLink="false">http://frankdenneman.nl/?p=872</guid>
		<description><![CDATA[VMware has recently updated the kb article &#8220; Using esxtop to identify storage performance issues Details&#8221; (KB1008205). The KB article provides information about how to use esxtop to determine the latency statistics across various devices. The article contain easy to follow, step-by-step instructions on how to setup ESXtop to monitor storage performance per HBA, LUN [...]<p><a href="http://frankdenneman.nl/2010/03/identify-storage-performance-issues/">Identify storage performance issues</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>VMware has recently updated the kb article &#8220;<a href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&#038;cmd=displayKC&#038;externalId=1008205"> Using esxtop to identify storage performance issues<br />
Details</a>&#8221; (KB1008205).<br />
The KB article provides information about how to use esxtop to determine the latency statistics across various devices. The article contain easy to follow, step-by-step instructions on how to setup ESXtop to monitor storage performance per HBA, LUN and virtual machine. It also list generic acceptable values to put your measured values in perspective. It&#8217;s a great article, bookmark it for future reference.</p>
<p>If you want to learn about threshold of certain metrics in ESXtop, please check out the <a href="http://www.yellow-bricks.com/esxtop/">ESXtop metric bible </a>featured on Yellow-bricks.com.</p>
<p>ESXtop is a great tool to view and measure certain criteria in real time, but sometimes you want to collect metrics for later reference. If this is the case, the tool vscsiStats might be helpful. vscsiStats is a tool to profile your storage environment and collects info such as outstanding IO, seekdistance and many many more.  Check out Duncan&#8217;s <a href="http://www.yellow-bricks.com/2009/12/17/vscsistats/">excellent article</a> on how to use vscsiStats.</p>
<p>Because vscsiStats will collect data in a .csv file you can create diagrams, Gabe written an <a href="http://www.gabesvirtualworld.com/?p=1022">article</a> how to convert the vscsiStats data into excel charts. </p>
<p><a href="http://frankdenneman.nl/2010/03/identify-storage-performance-issues/">Identify storage performance issues</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=872" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2010/03/identify-storage-performance-issues/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Lefthand SAN – Lessons learned</title>
		<link>http://frankdenneman.nl/2009/10/lefthand-san-lessons-learned/</link>
		<comments>http://frankdenneman.nl/2009/10/lefthand-san-lessons-learned/#comments</comments>
		<pubDate>Sun, 11 Oct 2009 09:41:10 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[ESX]]></category>
		<category><![CDATA[Lefthand]]></category>

		<guid isPermaLink="false">http://frankdenneman.wordpress.com/?p=280</guid>
		<description><![CDATA[I recently had the opportunity to deliver a virtual infrastructure which uses HP Lefthand SAN solution.  Setting up a Lefthand SAN is not that difficult, but there are some factors to take into consideration when planning and designing a Lefthand SAN properly. These are my leasons learned. Lefthand, not the tradional Head-Shelf configuration HP lefthand [...]<p><a href="http://frankdenneman.nl/2009/10/lefthand-san-lessons-learned/">Lefthand SAN – Lessons learned</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>I recently had the opportunity to deliver a virtual infrastructure which uses HP Lefthand SAN solution.  Setting up a Lefthand SAN is not that difficult, but there are some factors to take into consideration when planning and designing a Lefthand SAN properly. These are my leasons learned.</p>
<p><span id="more-280"></span></p>
<p><strong>Lefthand, not the tradional Head-Shelf configuration</strong><br />
HP lefthand SANs are based on iSCSI and are formed by Storage nodes. In traditional storage architectures, a controller manages arrays of disk drives. A Lefthand SAN is composed of storage modules. A Network Storage Module 2120 G2 (NSM node) is basically a HP DL185 server with 12 SAS or SATA drivers running SAN/iQ software.  </p>
<p>This architecture enables the aggregation of multiple storage nodes to create a storage cluster and this solves one of the toughest design questions when sizing a SAN.  Instead of estimating growth and buying a storage array to “grow into”, you can add storage nodes to the cluster when needed. This concludes the sales pitch.</p>
<p>But this technique of aggregating separate NSM nodes into a cluster raises some questions.  Questions such as;</p>
<ul>
<li>Where will the blocks of a single LUN be stored; all on one node, or across nodes?</li>
<li>How are LUNs managed?</li>
<li>How is datapath load managed?</li>
<li>What is the impact of failure of a NSM node ?</li>
</ul>
<p><strong>Block placement and Replication level</strong><br />
The placement of blocks of a LUN depends on the configured replication level. Replication level is a feature called Network RAID Level. Network RAID stripes and mirrors multiple copies of data across a cluster of storage nodes. Up to four levels of synchronous replication at LUN level can be configured;</p>
<ul>
<li>None</li>
<li>2-way</li>
<li>3-way</li>
<li>4-way</li>
</ul>
<p>Blocks will be stored on storage nodes according to replication level. If a LUN is created with the default replication level of 2-way, two authorative blocks are written at the same time to two different nodes. If a 3-way replication level is configured, blocks are stored on 3 nodes. 4-way = 4 nodes. (Replication cannot exceed the number of nodes in the cluster)<br />
SAN IQ will always start to write the next block to the second node containing the previous block. A picture is worth a thousand words.<br />
<a href="http://frankdenneman.nl/wp-content/uploads/2009/10/replication.jpg"><img class="aligncenter size-full wp-image-274" title="replication" src="http://frankdenneman.nl/wp-content/uploads/2009/10/replication.jpg" border="0" alt="replication" width="500" height="213" /></a><br />
<strong>Node order</strong><br />
The data in which blocks are written to the LUN is determined not by node hostname but by the order in which the nodes are added to the cluster.  The order of the placement of the nodes is extremely important if the SAN will span two locations. More information on this design issue later.</p>
<p><strong>Virtual IP and VIP Load Balancing</strong><br />
When setting up a Lefthand Cluster, a Virtual IP (VIP) needs to be configured. A VIP is required for iSCSI load balancing and fault tolerance. One NSM node will act as the VIP for the cluster, if this node fails, the VIP function will automatically failover to another node in the cluster.</p>
<p>The VIP will function as the iSCSI portal, ESX servers use the VIP for discovery and to log in to the volumes.  ESX servers can connect to volumes two ways. Using the VIP and using the VIP with the option load balancing (VIPLB) enabled.  When enabling VIPLB on LUNs, the SAN/iQ software will balance connections to different nodes of the cluster.</p>
<p>Configure the ESX iscsi inititiator with the VIP as destination address.  The VIP will supply the ESX servers with a target address for each LUN.  VIPLB will transfer initial communication to the gateway connection of the LUN.  Running the vmkiscsi-util command shows the VIP as portal and another ip address as target address of the LUN</p>
<p>root@esxhost00 vmfs]# vmkiscsi-util -i -t 58 -l<br />
***************************************************************************<br />
      Cisco iSCSI Driver Version &#8230; 3.6.3 (27-Jun-2005 )<br />
***************************************************************************<br />
TARGET NAME             : iqn.2003-10.com.lefthandnetworks:lefthandcluster:1542:volume3<br />
TARGET ALIAS            :<br />
HOST NO                 : 0<br />
BUS NO                  : 0<br />
TARGET ID               : 58<br />
TARGET ADDRESS          : 148.11.18.60:3260<br />
SESSION STATUS          : ESTABLISHED AT Fri Sep 11 14:51:13 2009<br />
NUMBER OF PORTALS          : 1<br />
PORTAL ADDRESS 1        : 148.11.18.9:3260,1<br />
SESSION ID              : ISID 00023d000001 TSIH 3b1</p>
<p><strong>Gateway Connection</strong><br />
This target address is what Lefthand calls a gateway connection. The gateway connection is described in the Lefthand SAN User Manual (page 561) as follows;</p>
<blockquote><p><em><span style="color:#000080;">Use iSCSI load balancing to improve iSCSI performance and scalability by distributing iSCSI sessions for different volumes evenly across storage nodes in a cluster. ISCSI load balancing uses iSCSI Login-Redirect.  Only initiators that support Login-Redirect should be used. When using VIP and load balancing, one iSCSI session acts as the gateway session. All I/O goes through this iSCSI session. You can determine which iSCSI session is the gateway by selecting the cluster, then clicking the iSCSI Sessions tab. The Gateway Connection column displays the IP address of the storage node hosting the load balancing iSCSI session. </span></em></p></blockquote>
<p>SAN/IQ will designate a node to act as gateway connection for the LUN, the VIP will send the IP address of this node as target address to all the ESX host. This means every host that use the LUN will connect to that specific node and this storage node will handle all IO for this LUN.</p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/10/gateway-connection.jpg"><img class="aligncenter size-full wp-image-273" title="gateway connection" src="http://frankdenneman.nl/wp-content/uploads/2009/10/gateway-connection.jpg" border="0" alt="gateway connection" width="342" height="198" /></a></p>
<p>This leads to the question, how will the GC handle IO for blocks not locally stored on that node?  When a block is requested that is stored on another node, the GC will fetch this block. All nodes are aware which block is stored on which node. The GC node will fetch this block of one of the nodes it’s stored and will send the results back to the ESX host.</p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/10/fetch.jpg"><img class="aligncenter size-full wp-image-272" title="fetch" src="http://frankdenneman.nl/wp-content/uploads/2009/10/fetch.jpg" border="0" alt="fetch" width="459" height="211" /></a></p>
<p><strong>Gateway Connection failover</strong><br />
Most Clusters will host more LUNs than it has available nodes. This means that each node will host the gateway connection role of multiple LUNs. If a node fails, the GC role will be transferred to the other nodes in the cluster. But when a NSM node returns back online, the VIP will not failback the GC roles. This will create an unbalance it the cluster, which needs to be solved as quickly as possible.  This can be done by issuing the RebalanceVIP for the volume from the cli.</p>
<p><a href="http://kensvirtualreality.wordpress.com">Ken Cline</a> asked me the question:</p>
<blockquote><p>How do I know when I need to use this command? Is there a status indicator to tell me?</p></blockquote>
<p>Well actually there isn&#8217;t and that is exactly the problem!<br />
After a node failure, you need to be aware of this behaviour and you will have to rebalance a volume yourself by running the RebalanceVIP command.<br />
The Lefthand CMC does not offer this option or some sort of alert.<br />
 <br />
<strong>Network Interface Bonds</strong><br />
How about the available bandwidth? Lefthand nodes come standard with two 1GB nics. The two nics can be placed in a bond. A NSM node has three NIC bond configurations;</p>
<ol>
<li>Active &#8211; Passive</li>
<li>Link Aggregation (802.3 ad)</li>
<li>Adaptive Load Balancing</li>
</ol>
<p>The most interesting is the Adaptive Load Balancing (ALB).  Adaptive Load Balancing combines the benefits of the increased bandwidth of 802.3ad with the network redundancy of Active-Passive. Both NICS are made active and they can be connected to different switches, no additional configuration on physicall switch level is needed.  </p>
<p>When an ALB bond is configured, it creates an interface. This interface balances traffic through both nics. But how will this work with the iSCSI protocol? In RFC 3270 (<a href="http://www.ietf.org/rfc/rfc3720.txt">http://www.ietf.org/rfc/rfc3720.txt</a>) iSCSI uses command connection allegiance;</p>
<blockquote><p><span style="color:#000080;"><em>For any iSCSI request issued over a TCP connection, the corresponding response and/or other related PDU(s) MUST be sent over the same connection.  We call this &#8220;connection allegiance&#8221;.</em> </span></p></blockquote>
<p>This means that the NSM node must use the same MAC address to send the IO back. How will this affect the bandwidth? As stated in the ISCSI SAN configuration guide; “<em>ESX Server</em><em>‐</em><em>based iSCSI initiators establish only one connection to each target.</em><em>”. </em></p>
<p>It looks like ESX will communicate with the gateway connection of the LUN with only NIC. I asked Calvin Zito (<a href="https://email.vmware.com/owa/redir.aspx?C=951e0447a24e476392d731e89ff0dd47&amp;URL=http%3a%2f%2ftwitter.com%2fHPstorageGuy" target="_blank">http://twitter.com/HPstorageGuy</a>) to educate me on ALB and how it handles connection allegiance.</p>
<blockquote><p><em><span style="color:#000080;">When you create a bond on an NSM, the bond becomes the &#8216;interface&#8217; and the MAC address of one of the NICs becomes the MAC address for the bond.  The individual NICs become &#8216;slaves&#8217; at that point.  I/O will be sent to and from the &#8216;interface&#8217; which is the bond and the bonding logic figures out how to manage the 2 slaves behind the scenes.  So with ALB, for transmitting packets, it will use both NICs or slaves, but they will be associated with the MAC of the bond interface, not the slave device.</span></em></p></blockquote>
<p>The bond uses the same IP and MAC address of the first onboard NIC. This means the node will uses both interfaces to transmit data, but only one to receive.</p>
<p><em>Chad Sakac (EMC), Andy Banta (VMware) and various other folks has written a multivendor post explaining how ESX and vSphere handles iSCSI traffic. A must read!</em><br />
<em><a href="http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html#more">http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html#more</a></em></p>
<p><strong>Design issues;</strong><br />
When designing a Lefthand SAN, these points are worth considering;</p>
<p><strong>Network RAID level Write performance </strong><br />
When 2-way replication is selected, blocks will be written on two nodes simultaneously, if a LUN is configured with 3-way replication, then blocks must be replicated to three nodes . Acknowledgements are given when blocks are written in cache on all the participating nodes. When selecting the replication level, keep in mind that higher protection levels leads to less write performance.</p>
<p><strong>Raid Levels NSM</strong><br />
Network RAID offers protection for storage node failure, but it does not protect against diskfailure within a storage node. Disk RAID levels need to be configured at Storage Node level, unlike most traditional arrays where raid level can be configured per LUN level. It is possible to mix storage nodes with different configurations of  RAID within a cluster, but the this can lead a lower useable capacity.</p>
<p>For example, the cluster exists of 12 TB nodes running RAID 10. Each node will provide 6TB in usable storage. When adding two 12TB nodes running RAID 5, each provides 10 TB of usable storage. However, due to the restrictions of how the cluster uses capacity, the NSM nodes running RAID 5 will still be limited to 6 TB per storage node. This restriction is because the cluster operates at the smallest usable per-storage node capacity.<br />
The RAID level of the storage node must first be set before it can join a Cluster. Check RAID level of clusternodes before configuring the new node, because you cannot change the RAID configuration without deleting data.</p>
<p><strong>Combining Replication Levels with RAID levels</strong><br />
RAID levels will ensure data redudancy inside the storage node, while Replication levels will ensure data redundancy on storage node level. Both higher RAID levels and Replication levels offer greater data reduncancy, but will have impact on capacity and performance. RAID5 with 2-way replication seems to be the sweet spot for most implementations, but when high available data protection is needed, Lefthand recommends 3-way replication with raid 5, ensuring triple mirroring with 3 parity blocks available.</p>
<p>I would not suggest RAID 0 with replication, because rebuilding a RAID set will always be quicker that copying an entire storage node over the network.</p>
<p><strong>Node placement</strong><br />
Mentioned previously, the data in which blocks are written to the LUN is determined by the order in which the nodes are added to the cluster.  When using 2-way replication, blocks are written to two consecutive nodes.  When designing a cluster the order of the placement of the nodes is extremely important if the SAN will be placed in two separate racks or even better span two locations.</p>
<p>Because the 2-way replication writes blocks on two consecutive nodes, adding the storage nodes to the cluster in alternating order will ensure that data is written to each rack or site.</p>
<p> <a href="http://frankdenneman.nl/wp-content/uploads/2009/10/addnode.jpg"><img class="aligncenter size-full wp-image-271" title="addnode" src="http://frankdenneman.nl/wp-content/uploads/2009/10/addnode.jpg" border="0" alt="addnode" width="500" height="360" /></a><br />
When nodes are added in the incorrect order or if a node is replaced, the general setting tab of the cluster properties allows you to “promote” or “demote” a storage node in the logical order. This list is the leading for the “write” order of the nodes. <strong> </strong></p>
<p><strong>Management Group and Managers</strong><br />
In addition to setting up data replication, it is important to setup managers. Managers play an important role in controlling data flow and access of clusters. Managers run inside a management group. Several storage nodes must be designated to run the manager service. Because managers use a voting algorithm, a majority of managers needs to be active to function.  This majority is called a Quorum. If quorum is lost, <span style="text-decoration:underline;">access to data is lost</span>. Be aware that acces to data is lost, not the data itself.<br />
An odd number of managers is recommended,  as an (low) even number of manager can get in a certain state where no majority  is determined.  The maximum number of managers is 5.</p>
<p><strong>Failover manager</strong><br />
A failover manager is a special edition of a manager. Instead of running on a storage node, the failover manager runs as a virtual appliance. A failover manager only function is maintaining quorum. When designing a SAN spanning two sites, running a failover manager is recommended. The optimum placement of the failover manager is a third site. Place an even amount of managers in both sites and run the failover manager at an independent site.<br />
<a href="http://frankdenneman.nl/wp-content/uploads/2009/10/managers1.jpg"><img class="aligncenter size-full wp-image-278" title="managers" src="http://frankdenneman.nl/wp-content/uploads/2009/10/managers1.jpg" border="0" alt="managers" width="345" height="279" /></a><br />
If a third site is not available, run the failover manager local on a server, creating a logical seperated site<br />
<a href="http://frankdenneman.nl/wp-content/uploads/2009/10/logicalsite.jpg"><img class="aligncenter size-full wp-image-279" title="logicalsite" src="http://frankdenneman.nl/wp-content/uploads/2009/10/logicalsite.jpg" border="0" alt="logicalsite" width="338" height="344" /></a></p>
<p><strong>Volumenames</strong><br />
And the last design issue, volume names cannot bechanged.  The volumename is the only setting which can’t be edited after creation. Plan your naming convention carefully, otherwise you will end up recreating volumes and restoring data. If someone of HP is reading this, please change this behaviour!</p>
<p><a href="http://frankdenneman.nl/2009/10/lefthand-san-lessons-learned/">Lefthand SAN – Lessons learned</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=280" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2009/10/lefthand-san-lessons-learned/feed/</wfw:commentRss>
		<slash:comments>35</slash:comments>
		</item>
		<item>
		<title>Deploying Volumes with Lefthand CLIQ</title>
		<link>http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq-2/</link>
		<comments>http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq-2/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 11:51:54 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[CLIQ]]></category>
		<category><![CDATA[Lefthand]]></category>
		<category><![CDATA[SAN]]></category>
		<category><![CDATA[SAN/IQ8]]></category>

		<guid isPermaLink="false">http://frankdenneman.wordpress.com/?p=256</guid>
		<description><![CDATA[Due to my extreme busy schedule I haven’t blogged for a while. Besides studying for VCDX and preparing for VMworld I’m also involved in a couple of projects. One project is designing and implementing a vSphere 4 virtual infrastructure. The VI will host an Exchange 2010 environment. Due to the size of my client’s environment, [...]<p><a href="http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq-2/">Deploying Volumes with Lefthand CLIQ</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>Due to my extreme busy schedule I haven’t blogged for a while. Besides studying for VCDX and preparing for VMworld I’m also involved in a couple of projects. One project is designing and implementing a vSphere 4 virtual infrastructure. The VI will host an Exchange 2010 environment. Due to the size of my client’s environment, 192 TB is used for hosting mailboxes. These datastores will be available thru RDM, which means creating 192 1-TB volumes and assigning them to every ESX host in the cluster.</p>
<p>I’ve tried to use the Centralized Management Console, but it’s tedious and error prone work. Mind numbing repetitive exercises makes me ask really dumb questions on twitter such as where did SCSI id 7 go? &lt;homer&gt;D’oh!&lt;/homer&gt;. So to protect myself from further bashing and being ridiculed I started to search for the Lefthand CLI to be able to automate the creation and assignment of volumes on a Lefthand SAN.</p>
<p><span id="more-597"></span></p>
<p><strong>CLIQ</strong> <br />
A CLI is available for the HP EVA series, but not much info is being published about the Lefthand Command Line Interface (CLIQ). But deep down in the bowels of the HP site a document about the CLIQ is published;</p>
<p>The “User Manuals API CLI User Manual” <a href="http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01806770/c01806770.pdf?jumpid=reg_R1002_USEN">http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01806770/c01806770.pdf?jumpid=reg_R1002_USEN</a></p>
<p>(But help in the SSH session will offer almost the same amount of info.)</p>
<p><strong>Log in info</strong><br />
Storage nodes configured with SAN/IQ 8.0 are standard equipped with the CLIQ and can be accessed by SSH. Log in to the CLIQ via SSH using the node IP address, but use port 16022 instead of the default port 22. When nodes are a part of a management group any node can be used to access the CLIQ. Open a SSH session to a random node and use the management group user and password information.</p>
<p><strong>Use of CLIQ</strong><br />
The CLIQ is not case sensitive and it the ordering of parameters is not specified. Any order will do.</p>
<p><strong>Let create a volume</strong><br />
<em>createVolume volumeName=VMFS001 clusterName=ESX-CLUSTER01 size=1TB Replication=2 thinProvision=1 description=&#8221;VI Datastore VMFS001&#8243;</em></p>
<p>Size:<br />
The following sizes can be used: MB, GB, and TB</p>
<p>Replication:<br />
The replication level for the volume 1=none,2-way,3-way or 4-way</p>
<p>ThinProvision:<br />
• 0 – Full-provisioning<br />
• 1 – Thin-provisioning</p>
<p><strong>Assigning the volume</strong><br />
<em>assignvolume volumeName=VMFS001 initiator=iqn.1998-01.com.vmware:esx001.acme.com;iqn.1998-01.com.vmware:esx002.acme.com</em></p>
<p>An IQN is used to assign the server to the volume, if a server item is preconfigured in the CMC, the CMC will list the server item at the Assigned Server tab of the volume.</p>
<p>Assigning multiple servers to a volume must be done with one command, if two separate commands are being used, the last command will overwrite the first command. Use a ; to delimit IQNs.</p>
<p>Those two commands saved me a lot of trouble and lots of unnecessary tedious work, hopefully you can benefit from these as well.</p>
<p><a href="http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq-2/">Deploying Volumes with Lefthand CLIQ</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=597" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deploying Volumes with Lefthand (CLIQ)</title>
		<link>http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq/</link>
		<comments>http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 11:51:54 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[CLIQ]]></category>
		<category><![CDATA[Lefthand]]></category>
		<category><![CDATA[SAN/IQ8]]></category>

		<guid isPermaLink="false">http://frankdenneman.wordpress.com/?p=256</guid>
		<description><![CDATA[Due to my extreme busy schedule I haven’t blogged for a while. Besides studying for VCDX and preparing for VMworld I’m also involved in a couple of projects. One project is designing and implementing a vSphere 4 virtual infrastructure. The VI will host an Exchange 2010 environment. Due to the size of my client’s environment, [...]<p><a href="http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq/">Deploying Volumes with Lefthand (CLIQ)</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>Due to my extreme busy schedule I haven’t blogged for a while. Besides studying for VCDX and preparing for VMworld I’m also involved in a couple of projects. One project is designing and implementing a vSphere 4 virtual infrastructure. The VI will host an Exchange 2010 environment. Due to the size of my client’s environment, 192 TB is used for hosting mailboxes. These datastores will be available thru RDM, which means creating 192 1-TB volumes and assigning them to every ESX host in the cluster.</p>
<p>I’ve tried to use the Centralized Management Console, but it’s tedious and error prone work. Mind numbing repetitive exercises makes me ask really dumb questions on twitter such as where did SCSI id 7 go? &lt;homer&gt;D’oh!&lt;/homer&gt;. So to protect myself from further bashing and being ridiculed I started to search for the Lefthand CLI to be able to automate the creation and assignment of volumes on a Lefthand SAN.</p>
<p><span id="more-256"></span></p>
<p><strong>CLIQ</strong> <br />
A CLI is available for the HP EVA series, but not much info is being published about the Lefthand Command Line Interface (CLIQ). But deep down in the bowels of the HP site a document about the CLIQ is published;</p>
<p>The “User Manuals API CLI User Manual” <a href="http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01806770/c01806770.pdf?jumpid=reg_R1002_USEN">http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01806770/c01806770.pdf?jumpid=reg_R1002_USEN</a></p>
<p>(But help in the SSH session will offer almost the same amount of info.)</p>
<p><strong>Log in info</strong><br />
Storage nodes configured with SAN/IQ 8.0 are standard equipped with the CLIQ and can be accessed by SSH. Log in to the CLIQ via SSH using the node IP address, but use port 16022 instead of the default port 22. When nodes are a part of a management group any node can be used to access the CLIQ. Open a SSH session to a random node and use the management group user and password information.</p>
<p><strong>Use of CLIQ</strong><br />
The CLIQ is not case sensitive and it the ordering of parameters is not specified. Any order will do.</p>
<p><strong>Let create a volume</strong><br />
<em>createVolume volumeName=VMFS001 clusterName=ESX-CLUSTER01 size=1TB Replication=2 thinProvision=1 description=&#8221;VI Datastore VMFS001&#8243;</em></p>
<p>Size:<br />
The following sizes can be used: MB, GB, and TB</p>
<p>Replication:<br />
The replication level for the volume 1=none,2-way,3-way or 4-way</p>
<p>ThinProvision:<br />
• 0 – Full-provisioning<br />
• 1 – Thin-provisioning</p>
<p><strong>Assigning the volume</strong><br />
<em>assignvolume volumeName=VMFS001 initiator=iqn.1998-01.com.vmware:esx001.acme.com;iqn.1998-01.com.vmware:esx002.acme.com</em></p>
<p>An IQN is used to assign the server to the volume, if a server item is preconfigured in the CMC, the CMC will list the server item at the Assigned Server tab of the volume.</p>
<p>Assigning multiple servers to a volume must be done with one command, if two separate commands are being used, the last command will overwrite the first command. Use a ; to delimit IQNs.</p>
<p>Those two commands saved me a lot of trouble and lots of unnecessary tedious work, hopefully you can benefit from these as well.</p>
<p><a href="http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq/">Deploying Volumes with Lefthand (CLIQ)</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=256" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2009/08/deploying-volumes-with-command-line-interface-lefthand-cliq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My first lefthand ISCSI VI architecture</title>
		<link>http://frankdenneman.nl/2009/03/my-first-lefthand-iscsi-vi-architecture/</link>
		<comments>http://frankdenneman.nl/2009/03/my-first-lefthand-iscsi-vi-architecture/#comments</comments>
		<pubDate>Mon, 23 Mar 2009 20:40:41 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[ESX]]></category>
		<category><![CDATA[ISCSI]]></category>
		<category><![CDATA[Lefthand]]></category>

		<guid isPermaLink="false">http://frankdenneman.wordpress.com/?p=120</guid>
		<description><![CDATA[I’m currently reviewing a design of a new virtual infrastructure. The VI uses multiple 10GB links to connect to a very large HP Lefthand san. I’m more a Fibre Channel guy, but I believe that this solution will smoke most mid-range FC-sans. I cannot wait to deploy the VI on the SAN. But I need [...]<p><a href="http://frankdenneman.nl/2009/03/my-first-lefthand-iscsi-vi-architecture/">My first lefthand ISCSI VI architecture</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>I’m currently reviewing a design of a new virtual infrastructure. The VI uses multiple 10GB links to connect to a very large HP Lefthand san. I’m more a Fibre Channel guy, but I believe that this solution will smoke most mid-range FC-sans. I cannot wait to deploy the VI on the SAN. But I need to get used to some differences between ISCSI and fibre channel configurations.<br />
<span id="more-120"></span><br />
The “problem” or my latest challenge is creating a LUN provisioning scheme where multiple clusters can connect to all the LUNs when a disaster occurs and a cluster has failed. Lefthand present the LUNs as targets instead using the LUN ID as a unique identifier. I’m used to design a LUN ID scheme per cluster, this way if a cluster fails, the “destination” cluster can connect to the LUNs of the failed cluster with the same LUN ID as the original cluster. </p>
<p>But when a (lefthand) LUN is presented to the ESX server, it will use a unique target ID instead of a unique LUN ID. (vmhba1:<span style="text-decoration:underline;">2</span>:0) I have done some testing and discovered that the assigned target ID can differ from ESX server to ESX server.</p>
<p>I’m curious if the target ID is used when creating the UUID of the VMFS datastore. And I&#8217;m especially interested in what will happen if multiple ESX hosts are going to communicate with the LUN when all the ESX hosts will use a different “path”</p>
<p>Maybe there isn’t a problem at all and different targets will work well, but is seems that I need to stop thinking in FC solutions and get used to iscsi Lefthand “quirks”.</p>
<p>I’ve read the field guide for VMware infrastructures, I googled on terms like “iscsi lun scheme’s” but I cannot seem to find any real-life scenario’s.<br />
Maybe my Google skills are pitiful at the moment, and maybe someone can shed some lights on this and how they solved this “problem”.</p>
<p><a href="http://frankdenneman.nl/2009/03/my-first-lefthand-iscsi-vi-architecture/">My first lefthand ISCSI VI architecture</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=120" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2009/03/my-first-lefthand-iscsi-vi-architecture/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Increasing the queue depth?</title>
		<link>http://frankdenneman.nl/2009/03/increasing-the-queue-depth/</link>
		<comments>http://frankdenneman.nl/2009/03/increasing-the-queue-depth/#comments</comments>
		<pubDate>Wed, 04 Mar 2009 20:45:04 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[Disk.SchedNumReqOutstanding]]></category>
		<category><![CDATA[Queue Depth]]></category>
		<category><![CDATA[VMware]]></category>

		<guid isPermaLink="false">http://frankdenneman.wordpress.com/?p=101</guid>
		<description><![CDATA[When it comes to IO performance in the virtual infrastructure one of the most recommended “tweaks” is changing the Queue Depth (QD). But most forget that the QD parameter is just a small part of the IO path. The IO path exists of layers of hardware and software components, each of these components can have [...]<p><a href="http://frankdenneman.nl/2009/03/increasing-the-queue-depth/">Increasing the queue depth?</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>When it comes to IO performance in the virtual infrastructure one of the most recommended “tweaks” is changing the Queue Depth <em>(QD)</em>. But most forget that the QD parameter is just a small part of the IO path. The IO path exists of layers of hardware and software components, each of these components can have a huge impact on the IO performance. The best results are achieved when the whole system is analysed and not just the ESX host alone.<br />
 <br />
To be honest I believe that most environments will profit more from a balanced storage design than adjusting the default values. But if the workload is balanced between the storage controllers and IO queuing still occurs, adjusting some parameters might increase IO performance.<br />
Merely increasing the parameters can cause high latency up to the point of major slowdowns. Some factors need to be taking in consideration.<br />
 <br />
<strong>LUN queue depth</strong><br />
The LUN queue depth determines how many commands the HBA is willing to accept and process <strong>per LUN</strong>, if a single virtual machine is issuing IO, the QD setting applies but when multiple VM’s are simultaneously issuing IO’s to the LUN, the <em>Disk.SchedNumReqOutstanding</em> (DSNRO) value becomes the leading parameter.<br />
 <br />
Increasing the QD value without changing the Disk.SchedNumReqOutstanding setting will only be beneficial when one VM is issuing commands. It is considered best practise to use the same value for the QD and DSNRO parameters!<br />
Read Duncan&#8217;s excellent <a href="http://www.yellow-bricks.com/2008/07/21/queuedepth-and-whats-next/">article</a> about the DSNRO setting.</p>
<p> <br />
<strong>Qlogic Execution Throttle</strong><br />
Qlogic has a firmware setting called „Execution Throttle&#8221; which specifies the maximum number of simultaneous commands the adapter will send. The default value is 16, increasing the value above 64 has little to no effect, because the maximum parallel execution of SCSI operations is 64.<br />
(Page 170 of <a href="http://www.vmware.com/pdf/vi3_san_design_deploy.pdf">ESX 3.5 VMware SAN System Design and Deployment Guide</a>)<br />
 <br />
If the QD is increased, execution throttle and the DSNRO must be set with similar values, but to calculate the proper QD the fan-in ratio of the storage port needs to be calculated.<br />
 <br />
<strong>Target Port Queue Depth</strong><br />
A queue exist on the storage array controller port as well, this is called the &#8220;<em>Target Port Queue Depth</em>&#8220;. Modern midrange storage arrays, like most EMC- and HP arrays can handle around 2048 outstanding IO’s. 2048 IO’s sounds a lot, but most of the time multiple servers communicate with the storage controller at the same time. Because a port can only service one request at a time, additional requests are placed in queue and when the storage controller port receives more than 2048 IO requests, the queue gets flooded. When the queue depth is reached, this status is called <em>(QFULL)</em>, the storage controller issues an IO throttling command to the host to suspend further requests until space in the queue becomes available. The ESX host accepts the IO throttling command and decreases the LUN queue depth to the minimum value, which is 1!<br />
 <br />
The VMkernel will check every 2 seconds to check if the QFULL condition is resolved. If it is resolved, the VMkernel will slowly increase the LUN queue depth to its normal value, usually this can take up to 60 seconds.<br />
 <br />
<strong>Calculating the queue depth\Execution Throttle value</strong><br />
To prevent flooding the target port queue depth, the result of the combination of number of host paths + execution throttle value + number of presented LUNs through the host port must be less than the target port queue depth. In short <em>T =&gt; P * q * L</em><br />
<em><br />
T = Target Port Queue Depth<br />
P = Paths connected to the target port<br />
Q = Queue depth<br />
L = number of LUN presented to the host through this port<br />
</em></p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/03/Calc-QD-1.png"><img src="http://frankdenneman.nl/wp-content/uploads/2009/03/Calc-QD-1-235x300.png" alt="Location of TPQL" title="Calc QD-1" width="235" height="300" class="aligncenter size-medium wp-image-906" /></a></p>
<p>Despite having four paths to the LUN, ESX can only utilize one (active) path for sending IO.  As a result, when calculating the appropriate queue depth, you use only the active path for <em>&#8220;Paths connected to the target port (P)&#8221;</em> in the calculation, i.e. <em>P=1</em>.</p>
<p>But in a virtual infrastructure environment, multiple ESX hosts communicate with the storage port, therefore the QD should be calculated by the following formula:<br />
 <br />
<em>T =&gt; ESX Host 1 (P * Q * L) + ESX Host 2 (P * Q * L) &#8230;.. + ESX Host n (P * Q * L)</em><br />
 <br />
For example an 8 ESX host cluster connects to 15 LUNS (L) presented by an EVA8000 (4 target ports)* An ESX server issues IO through one active path (P), so P=1 and L=15.<br />
 <br />
The execution throttle\queue depth can be set to 136,5=&gt; T=2048 (1 * Q * 15) = 136,5<br />
But using this setting one ESX host can fill the entire target port queue depth by itself, but the environment exists of 8 ESX hosts. 136,5/ 8 = 17,06<br />
 <br />
In this situation all the ESX Host communicate to all the LUNs through one port. Which does not happen in many situations if a proper load-balancing design is applied. Most arrays have two controllers and every controller has at least two ports. In the case of a controller failure, at least two ports are available to accept IO requests.<br />
 <br />
It is possible to calculate the queue depth conservatively to ensure a minimum decrease of performance when losing a controller during a failure, but this will lead to underutilizing the storage array during normal operation, which will hopefully be 99,999% of the time. It is better to calculate a value which utlilize the array properly without flooding the target port queue.<br />
 <br />
If you assume that multiple ports are available and that all LUNs are balanced across the available ports on the controllers, it will effectively quadruple the target port queue depth and therefore increase the values of the execution throttle in the example above to 68. Besides the fact that you cannot increase this value above 64, it is wise to decrease the value to a number below max value, it will create a buffer for safety </p>
<p><strong>What’s the Best Setting for Queue Depth?</strong><br />
The examples mentioned are pure worst case scenario stuff, most of the time it is highly unlikely that all hosts perform at their maximum level at any one time. Changing the defaults can improve throughput, but most of the time it is just a shot in the dark. Although you are configuring your ESX hosts with the same values, not every load on the ESX server is the same. Every environment is different and so the optimal queue depths would differ. One needs to test and analyse its environment. Please do not increase the QD without analysing the environment; this can be more harmful than useful.</p>
<p><a href="http://frankdenneman.nl/2009/03/increasing-the-queue-depth/">Increasing the queue depth?</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=101" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2009/03/increasing-the-queue-depth/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>HP CA and the use of LUN balancing scripts</title>
		<link>http://frankdenneman.nl/2009/02/hp-continuous-access-and-the-use-of-lun-balancing-scripts/</link>
		<comments>http://frankdenneman.nl/2009/02/hp-continuous-access-and-the-use-of-lun-balancing-scripts/#comments</comments>
		<pubDate>Mon, 09 Feb 2009 17:43:58 +0000</pubDate>
		<dc:creator>Frank Denneman</dc:creator>
				<category><![CDATA[Storage]]></category>
		<category><![CDATA[ALUA]]></category>
		<category><![CDATA[ESX3.x]]></category>
		<category><![CDATA[HP Continuous Access]]></category>
		<category><![CDATA[VMware]]></category>

		<guid isPermaLink="false">http://frankdenneman.wordpress.com/?p=5</guid>
		<description><![CDATA[Some of my customers use HP Continuous Access to replicate VM data between storage arrays. Lately a couple of LUN balancing powershell- and Perl scripts were introduced in the VMware community. First of all, there is nothing wrong with those scripts. For example, Justin Emerson wrote an excellent script that balances the active paths to [...]<p><a href="http://frankdenneman.nl/2009/02/hp-continuous-access-and-the-use-of-lun-balancing-scripts/">HP CA and the use of LUN balancing scripts</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
]]></description>
			<content:encoded><![CDATA[<p>Some of my customers use HP Continuous Access to replicate VM data between storage arrays. Lately a couple of LUN balancing powershell- and Perl scripts were introduced in the VMware community. First of all, there is nothing wrong with those scripts. For example, Justin Emerson wrote an excellent script that balances the active paths to an active/active SAN. But using an auto balance scripts when Continuous Access is used in the Virtual Infrastructure can result in added IO latency and unnecessary storage processor load. Here’s why: <span id="more-133"></span></p>
<p><strong>AA type</strong><br />
The HP EVA array range (4x,6x,8x) are categorized by VMware as active-active arrays. Active/active storage arrays are further divided into two categories:</p>
<p>•	Symmetrical Active-Active (SAA)<br />
•	Asymmetric Active-Active (AAA)</p>
<p>SAA arrays are considered by many as true active-active arrays. IO request can be issued over all paths and every controller in the array can accept and send IO to the LUN. EMC DMXs and HP XPs are SAA arrays.</p>
<p><strong>Asymmetric Active-Active and Asymmetric Logical Unit Access (ALUA) compliance</strong><br />
The arrays from the EVA family are dual controller AAA arrays and are compliant with the SCSI Asymmetric Logical Unit Access (ALUA) standard for LUN access/failover and I/O processing. </p>
<p>In an Asymmetric Active-Active Array both controllers are online and both can accept IO, but only one controller is assigned as the owning controller of the LUN. The owning controller can issue IO commands directly to the LUN, this is called an optimized path. The non-owning controller, or proxy controller can accept IO commands, but cannot communicate with the LUN. This is called an non-optimized path. If a read request reaches the array through the proxy controller, it will be forwarded to the owning controller of the LUN. This behavior is called a proxy read. </p>
<p>It looks like I’m describing an active/passive array, but the main difference is that Active-Passive arrays transfer the ownership as soon as it receives IO on the non-owning controller. IO request are transferred between the controllers on the back-end of an AAA, making the process transparent to the ESX host. In a asymmetric Active-Active Array, storage processor ports have a certain port state with respect to a given LUN: </p>
<p><strong><br />
•	Active\Optimized<br />
•	Active\Non-Optimized<br />
•	Standby<br />
•	Unavailable<br />
•	Transitioning<br />
</strong><br />
If the access characteristics of a port differs from another port asymmetric logical unit access occurs. ALUA provides a way to allow to report the states of the port to the host, the host can use the states of the port to prioritize paths. Unfortunately, ESX 3.x does not support ALUA, what that actually means is that ESX does not has the ability to identify, the LUN controller ownership, or to put it more precisely the the active\optimized or active non-optimized paths. ALUA support is implemented in the PSA (Pluggable Storage Architecture) of ESX4. </p>
<p><strong>Proxy reads</strong><br />
Read IO requests received by the proxy controller (1) are sent to the owning controller (2), which retrieves the data from disk (3), caches the read data (4) and mirrors the data to the cache of the proxy controller (5). The proxy controller satisfies the host read request (6), making this process transparent to the ESX host. Proxy reads add unnecessary latency to the IO request. It also creates higher mirror port utilization. IO-Write commands to the proxy controller suffer less performance impact. Due to fault protection all writes are mirrored in both controllers’ caches, but the owning controller is still responsible for flushing the data to disk.</p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/02/proxy-io.png"><img src="http://frankdenneman.nl/wp-content/uploads/2009/02/proxy-io-300x248.png" alt="Proxy IO path" title="proxy io" width="300" height="248" class="aligncenter size-medium wp-image-809" /></a></p>
<p><strong>Mirror Port</strong><br />
The controller mirror ports are used for cache writes and proxy reads. If you setup your environment with correct multipathing, the mirror ports will only have to handle the write mirroring.</p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/02/optimized.png"><img src="http://frankdenneman.nl/wp-content/uploads/2009/02/optimized-300x245.png" alt="" title="optimized" width="300" height="245" class="aligncenter size-medium wp-image-810" /></a></p>
<p><strong>Implicit LUN transition</strong><br />
If the EVA array detects, in a period of 60 minutes, that at least 66% of the total read request to a LUN are proxy reads, ownership is transitioned to the non-owning proxy controller and making it the owning controller. Justin’s powershell script assigns the same path to every server the same way. This way the EVA should switch the managing controller within the hour. (If you have multiple ESX hosts run multiple VM’s on the LUN of course)</p>
<p><strong>Continuous Access DR Groups</strong><br />
If HP Continuous Access (CA) is used to replicate LUNs between two arrays, extra care must be taken when planning to use a LUN balancing script. CA sets up replication relationships between LUNs on different arrays; this is called a Data Replication (DR) group. A DR group can be considered a consistency group, all LUNs in a DR group fail over together, share a log and preserve write order within the group. Because of this requirements, one controller is assigned as managing (owning) controller for all member LUNS. </p>
<p><strong>Implicit LUN transition</strong><br />
Because one controller manages a group of LUNs, CA disables implicit LUN transitions for all DR group members. Mixing Implicit LUN transition (ILT) together with a large DR group can be a recipe for disaster. Imagine if ILT was switched on and the array detects too much proxy reads on a LUN in the group. </p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/02/ILT-1.png"><img src="http://frankdenneman.nl/wp-content/uploads/2009/02/ILT-1-300x168.png" alt="" title="ILT-1" width="300" height="168" class="aligncenter size-medium wp-image-815" /></a></p>
<p>The array will evoke an ILT for that LUN. Because all the LUN in the DR group must use the same controller to meet the consistency requirement all members are transitioned together. </p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/02/ILT-22.png"><img src="http://frankdenneman.nl/wp-content/uploads/2009/02/ILT-22-300x167.png" alt="" title="ILT-2" width="300" height="167" class="aligncenter size-medium wp-image-818" /></a></p>
<p>So far so good, but you can bet on it that a new proxy read situation appears, due to the multiple hosts communicating with the disks. This will evoke another Implicit LUN transition. </p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/02/ILT-31.png"><img src="http://frankdenneman.nl/wp-content/uploads/2009/02/ILT-31-300x159.png" alt="" title="ILT-3" width="300" height="159" class="aligncenter size-medium wp-image-820" /></a></p>
<p>And now we are back where we started. Enabling Implicit LUN transition can create some sort of bouncing group of LUNs between controllers. This is a sure way of giving you SAN administrator a small stroke. The downside of disabling ILT on a DR group is the possibility that LUNs may experience excessive proxy reads.</p>
<p><strong>Using auto balance script on CA managed LUNs</strong><br />
Because ESX does not inquire about the status of the port when testing the path, the auto balance scripts cannot discover the optimized path. When paths are being initialized during the boot of the ESX host, it will just enumerate the paths available to it. Scanning the first controller and the lowest device number, this path might not necessarily be the path to the managing controller. Because all paths are active, the script will select the next path for a LUN. This can lead to IO requests arriving at the proxy controller and because implicit LUN transition is disabled, proxy reads will keep occurring.</p>
<p><strong>Custom load balancing</strong><br />
If you have high workload intensity you might experience negative impact on IO performance when running a generic LUN balancing script. If you implement CA in your virtual infrastructure, it is better to take some time to design a custom load balance script. Using a well designed load balancing script along with fixed multipathing policy does not eliminate proxy reads, but it should only occur as a temporary condition during failures. </p>
<p><a href="http://frankdenneman.nl/wp-content/uploads/2009/02/replicated-luns-lb.png"><img src="http://frankdenneman.nl/wp-content/uploads/2009/02/replicated-luns-lb-300x212.png" alt="Load Balanced replicated LUNs" title="replicated luns lb" width="300" height="212" class="aligncenter size-medium wp-image-814" /></a></p>
<p>In the example, each DR group contains 5 LUNS, because every LUN in the DR group share the same managing controller, the optimized path for LUN 1-5 is through Storage Processor A. Alternating HBA&#8217;s are used to load balance on the ESX side. </p>
<p><strong>Determining the managing controller</strong><br />
To correctly load balance on EVA arrays you must know which controller owns the LUN. Command View EVA shows the managing controller of the LUN, but checking several LUNs via Command View EVA can be a lengthy process.<br />
Using Command view EVA (CVE)</p>
<p>•	On the presentation tab of the LUN (vdisk) properties<br />
•	On the general tab of the DR Group properties</p>
<p>HP Storage System Scripting Utility (SSSU) can help you to speed up discovering the management controller.Using SSSU: </p>
<p>LS VDISK <disk name> (Name will suffice, no need for complete path)<br />
The owning controller is listed as online controller (controller name)</p>
<p>I’m not aware of any powershell tools to manage an EVA storage array.<br />
My contacts at HP cannot confirm if any new tooling except SSSU and CVE will appear soon</p>
<p><a href="http://frankdenneman.nl/2009/02/hp-continuous-access-and-the-use-of-lun-balancing-scripts/">HP CA and the use of LUN balancing scripts</a> is a post from: <a href="http://frankdenneman.nl">frankdenneman.nl</a></p>
 <img src="http://frankdenneman.nl/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=133" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://frankdenneman.nl/2009/02/hp-continuous-access-and-the-use-of-lun-balancing-scripts/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/


Served from: frankdenneman.nl @ 2012-02-04 22:49:07 -->
