VSPHERE STORAGE AREA NETWORK TRAFFIC SYSTEM NETWORK RESOURCE POOL -NETIOC

After posting the Network I/O Control primer I received a couple of questions about the vSAN traffic system network resource pool, such as: What’s the “vSphere Storage Area Network Traffic” system network resource pool for? I tried to further investigate by searching practically everywhere, but I didn’t manage to find any detailed description… The vSphere Storage Area Network Traffic is a system network pool designed for a future vSphere storage feature that is not released yet. Unfortunately Network I/O Control exposes this system network resource pool in vSphere 5.1 already. Although it is defined as system network resource pool, the vSphere client lists the network pool as user-defined, providing the impression that this pool can be assigned to other streams of traffic. Unfortunately this is not possible. The pool is a system network resource pool and therefor only available to traffic that is specifically tagged by the VMkernel. I received the question if this network pool could be assigned to a third party NIC or an FCoE card. As mentioned, network pools only manage traffic that is assigned with the appropriate tag. Tagging of traffic is only done by the VMkernel and this functionality is not exposed to the user. Although its exposed in the user-interface, this system network pool has no function and it will not have any affect on other network streams. It can be happily ignored.

ERROR -1 IN OPENING & READING THE SLOT FILE ERROR IN STORAGERM.LOG (SIOC)

The problem Recently I noticed that my datastore cluster was not providing Latency statistics during initial placement. The datastore recommendation during initial placement displayed space utilization statistics, but displayed 0 in the I/O Latency Before column The performance statistics of my datastores showed that there was I/O activity on the datastores. However the SIOC statistics all showed no I/O activity on the datastore The SIOC log file (storagerm.log) showed the following error: Open /vmfs/volumes/ /.iorm.sf/slotsfile (0x10000042, 0x0) failed: permission denied Giving UP Permission denied Error -1 opening SLOT file /vmfs/volumes/datastore/.iorm.sf/slotsfile Error -1 in opening & reading the slot file Couldn’t get a slot Successfully closed file 6 Error in opening stat file for device: datastore. Ignoring this device The following permissions were applied on the slotfile: The Solution Engineering explained to me that these permissions were not the default standards and default permissions are read and execute access for everyone and write access for the owner of the file. The following command sets the correct permissions on the slotsfile: Chmod 755 /vmfs/volumes/datastore/.iorm.sf/slotsfile Checking the permission shows that the permissions are applied: The SIOC statistics started to show the I/O activity on the datastore Before changing the permissions on the slotsfile I stopped the SIOC service on the host by entering the command: /etc/init.d/storageRM stop However I believe this isn’t necessary. Changing the permissions without stopping SIOC on the host should work. Cause We are not sure what causes this problem and support and engineering are troubleshooting this error. In my case I believe it has to do with the frequent restructuring of my lab. vCenter and ESXi servers are reinstalled regularly, but I have never reformatted my datastores. I do not expect to see this error appear in stable production environments. Please check the current permissions on the slotsfile if Storage DRS does not show I/O utilization on the datastore. (VMs must be running and I/O metric on the Datastore cluster must be enabled of course) I expect the knowledge base article to be available soon.

ERROR -1 IN OPENING & READING THE SLOT FILE ERROR IN STORAGERM.LOG (SIOC)

The problem Recently I noticed that my datastore cluster was not providing Latency statistics during initial placement. The datastore recommendation during initial placement displayed space utilization statistics, but displayed 0 in the I/O Latency Before column The performance statistics of my datastores showed that there was I/O activity on the datastores. However the SIOC statistics all showed no I/O activity on the datastore The SIOC log file (storagerm.log) showed the following error: Open /vmfs/volumes/ /.iorm.sf/slotsfile (0x10000042, 0x0) failed: permission denied Giving UP Permission denied Error -1 opening SLOT file /vmfs/volumes/datastore/.iorm.sf/slotsfile Error -1 in opening & reading the slot file Couldn’t get a slot Successfully closed file 6 Error in opening stat file for device: datastore. Ignoring this device The following permissions were applied on the slotfile: The Solution Engineering explained to me that these permissions were not the default standards and default permissions are read and execute access for everyone and write access for the owner of the file. The following command sets the correct permissions on the slotsfile: Chmod 755 /vmfs/volumes/datastore/.iorm.sf/slotsfile Checking the permission shows that the permissions are applied: The SIOC statistics started to show the I/O activity on the datastore Before changing the permissions on the slotsfile I stopped the SIOC service on the host by entering the command: /etc/init.d/storageRM stop However I believe this isn’t necessary. Changing the permissions without stopping SIOC on the host should work. Cause We are not sure what causes this problem and support and engineering are troubleshooting this error. In my case I believe it has to do with the frequent restructuring of my lab. vCenter and ESXi servers are reinstalled regularly, but I have never reformatted my datastores. I do not expect to see this error appear in stable production environments. Please check the current permissions on the slotsfile if Storage DRS does not show I/O utilization on the datastore. (VMs must be running and I/O metric on the Datastore cluster must be enabled of course) I expect the knowledge base article to be available soon.

HOW TO ENABLE SIOC STATS ONLY MODE?

Today on twitter, David Chadwick, Cormac Hogan and I were discussing SIOC stats only mode. SIOC stats only mode gathers statistics to provide you insights on the I/O utilization of the datastore. Please note that Stats only mode does not enable the datastore-wide scheduler and will not enforce throttling. Stats only mode is disabled due to the (significant) increase of log data into the vCenter database. SIOC stats only mode is available from vSphere 5.1 and can be enabled via the web client. To enable SIOC stats only mode go to:

WHY IS VMOTION USING THE MANAGEMENT NETWORK INSTEAD OF THE VMOTION NETWORK?

On the community forums, I’ve seen some questions about the use of the management network by vMotion operations. The two most common scenarios are explained, please let me know if you notice this behavior in other scenarios. Scenario 1: Cross host and non-shared datastore migration vSphere 5.1 provides the ability to migrate a virtual machine between hosts and non-shared datastores simultaneously. If the virtual machine is stored on a local or non-shared datastore vMotion is using the vMotion network to transfer the data to the destination datastore. When monitoring the VMkernel NICs, some traffic can be seen following over the management NIC instead of the VMkernel NIC enabled for vMotion. When migrating a virtual machine, vMotion determines hot data and cold data. Virtual disks or snapshots that are actively used are considered hot data, while the cold data are the underlying snapshots and base disk. Let’s use a virtual machine with 5 snapshots as an example. The active data is the recent snapshot, this is sent over across the vMotion network while the base disk and the 4 older snapshots are migrated via a network file copy operation across the first VMkernel NIC (vmk0).

PLEASE HELP VMWARE BRING PROJECT NEE DOWN TO ITS (K)NEES

Folks, We have been testing the HOL platform for a few weeks using automated scripts and thought it would be great if we could do a real time stress test of our environment. The goal of this test is to put a massive load on our infrastructure and see how fast we can get the service to crawl to its knees. We understand that this is not a very good scientific approach but think collecting real user data will help us prepare for massive loads like Partner Exchange and VMworld. Currently we have close to 10,000 users in the Beta so we expect the application / infrastructure to keel over right after we start. We want to use this test as a way to learn what happens and where the smoke is coming from. If you registered for the Beta and you do not have an account please check your inbox from email from admin projectnee.com to verify your account. If you have not registered its time to do so,…REGISTER FOR BETA Here is what we need you to do:

STORAGE DRS INITIAL PLACEMENT WORKFLOW

Last week I received the question how exactly Storage DRS picks a datastore. On a SDRS the initial placement of a vm is done on the weight calculated based on the storage free and IO. My question is: when I have a similar weight between all the datastore in the cluster, which datastore is choose for the initial placement? Storage DRS takes the virtual machine configuration into account, the platform & user-defined constraints and the resource utilization of the datastores within the cluster. Let’s take a closer look at the Storage DRS initial placement workflow. User-defined constraint When selecting the datastore cluster as a storage destination, the default datastore cluster affinity rule is applied to the virtual machine configuration. The datastore cluster can be configured with a VMDK affinity rule (Keep files together) or a VMDK anti-affinity rule (Keep files separated). Storage DRS obeys the affinity rule and is forced to find a datastore that is big enough to store the entire virtual machine or the individual VMDK files. The affinity rule is considered to be a user-defined constraint. Platform constraint The next step in the process is to present a list of valid datastores to the Storage DRS initial placement algorithm. The Storage DRS placement engine checks for platform constraints. The first platform constraint is the check of the connectivity state of the datastores. Fully connected datastores (datastores connected to all host in the compute cluster) are preferred over partially connected datastores (datastores that are not connected to all host in the cluster) due to the impact of mobility of the virtual machine in the compute cluster. The second platform constraint is applicable to thin-provisioned LUNs. If the datastore exceeds the thin-provisioning threshold of 75 percent, the VASA provider (if installed) triggers the thin-provisioning alarm. In response to this alarm Storage DRS removes the datastores from the list of valid destination datastores, in order to prevent virtual machine placement on low-capacity datastores. Resource utilization After the constraint handling, Storage DRS sorts the valid datastores in order of combined resource utilization rate. The combined resource utilization rate consists of the space utilization and the I/O utilization of a datastore. The best-combined resource utilization rate is a datastore that has a high level of free capacity and Low I/O utilization. Storage DRS selects the datastore that has the best-combined utilization rate and attempts to place the virtual machine. If the virtual machine is configured with a VMDK anti-affinity rule, Storage DRS starts with placing the biggest VMDK first.

10 GUIDELINES FOR CREATING GOOD LOOKING DIAGRAMS

Frequently I receive the question which application I use to create my diagrams. I used to use Microsoft Visio but starting to use Omnigraffle a year ago. However I feel it’s not the program that makes these diagrams. Although it’s true that some functionality help me to create the diagrams more easily, it’s more about following some basic guidelines. I’ve picked up these guidelines along the way, they work for me and hopefully they can help you too. 1: Find a suitable color scheme A color scheme plays a very important role in a diagram. Colors have various functions within a diagram. I like to use various tints of a color to indicate a relation between objects, whether it has to indicate a relation within the same structure layer or the same consumer or provider. For example all storage related functions or objects have different shades of blue or resource pool structure of customer A have different shades of green. Picking the correct color for a diagram is very difficult and trying to select the perfect collection of colors wasted (I should say invested) many hours of my life. During that time I learned a lot, here are a few tips:

HOW TO SETUP MULTI-NIC VMOTION ON A DISTRIBUTED VSWITCH

This article provides you an overview of the steps required to setup a Multi-NIC vMotion configuration on an existing distributed Switch with the vSphere 5.1 web client. This article is created to act as reference material for the designing your vMotion network series. Configuring Multi-NIC vMotion is done at two layers, first the distributed switch layer where we are going to create two distributed port groups and the second layer is the host layer. At the host layer we are going to configure two VMkernel NICs and connect them to the appropriate distributed port group. Before you start you need to have ready two ip-addresses for the VMkernel NICs, their respective subnet and their VLAN ID. Distributed switch level The first two steps are done at the distributed switch level, click on the networking icon in the home screen and select the distributed switch. Step 1: Create the vMotion distributed port groups on the distributed switch The initial configuration is pretty much basic, just provide a name and use the defaults: 1: Select the distributed switch, right click and select “New Distributed Port Group”. 2: Provide a name, call it “vMotion-01” and confirm it’s the correct distributed switch. 3: Keep the defaults at Configure settings and click next. 4: Review the settings and click finish. Do the same for the second distributed port group, name that vMotion-02 Step 2: Configuring the vMotion distributed port groups Configuring the vMotion distributed port groups consist of two changes. Enter the VLAN ID and set the correct failover order. 1: Select distributed Port Group vMotion-01 in the left side of your screen and right click and select edit settings. 2: Go to VLAN, select VLAN as VLAN type and enter the first VLAN used by the first VMkernel NIC. 3: Select “Teaming and failover” , move the second dvUplink down to mark it as a “Standby uplink”. Verify that load balancing is set to “Route based on originating virtual port”. 4: Click OK Repeat the instructions of step 2 for distributed Portgroup vMotion-02, but use the VLAN ID used by the IP-address of the second VMkernel NIC. Go to teaming and failover and configure the uplinks in an alternate order, ensuring that the second vMotion VMkernel NIC is using dvUplink2. Host level We are done at the distributed switch level, the distributed switch now updates all connected hosts and each host has access to the distributed port groups. Two vMotion enabled VMkernel NICs are configured at host level. Go to Hosts and Clusters view. Step 3: Create vMotion enabled VMkernel NICs 1: Select the first host in the cluster, go to manage, networking and “add host networking”. 2: Select VMkernel Network Adapter. 3: Select an existing distributed portgroup, click on Browse and select distributed Port Group “vMotion-01” Click on OK and click on Next. 4: Select vMotion traffic and click on Next. 5: Select static IPv4 settings, Enter the IP-address of the first VMkernel NIC corresponding with the VLAN ID set on distributed Port Group vMotion-01. 6: Click on next and review the settings. Create the second vMotion enabled VMkernel NIC. Configure identically except: 1: Select vMotion-02 portgroup 2: Enter IP-address corresponding with the VLAN ID on distributed Port Group vMotion-02. The setup of a Multi-NiC vMotion configuration on a single host is complete. Repeat Step 3 on each host in the cluster.

DESIGNING YOUR VMOTION NETWORK - 3 REASONS WHY I USE A DISTRIBUTED SWITCH FOR VMOTION NETWORKS

If your environment is licensed with the enterprise plus license you can choose to use a standard vSwitch or use a distributed switch for your vMotion network. Multi-NIC vMotion network is a complex configuration that consists out of many different components. Each component needs to be configured identically on each host in the cluster. Distributed switches can help you with that and in addition provide you with tools to prioritize traffic and allow other network streams to utilize available bandwidth when no vMotion traffic is active. 1. Use distributed portgroups consistent configuration across the cluster Consistently configuring two portgroups on each host in the cluster with alternating vmnic failover order is a challenging task. It’s a mere fact that humans are not good in performing a repetitive task consistently. Many virtual infrastructure health checks at various sites confirmed that fact. The beauty of distributed switches (VDS) is that it acts as profile configuration. Configure the portgroup once and the distributed switch propagates these settings to all the connected hosts of that distributed switch. A multi-NIC vMotion configuration is a perfect use-case to leverage the advantages of the distributed switch. As mentioned a Multi-NIC vMotion configuration is a complex configuration consisting of two portgroups with their own unique settings. By using the distributed switch, only two distributed portgroups need to be configured and the VDS distributes the portgroups and their settings to each host connected to the VDS. This saves a lot of work and you are ensured that each host is using the same configuration. Consistency in your cluster is important for to provide you reliable operations and consistent performance. 2. Set traffic priority with Network I/O Control Network I/O control can help you to consolidate the network connections into a single manageable switch, allowing you to utilize all the bandwidth available while still respecting requirements such as traffic isolation or traffic prioritization. This is applicable to both configurations containing a small number of 10GbE uplink as well for configurations that contain a high number of 1GbE ports. vMotion has a high bandwidth usage, performing optimally in high bandwidth environments. However vMotion traffic is not always present. Isolating NICs in order to protect other network traffic streams or provide a particular level of bandwidth can be uneconomical and may leave bandwidth idle and unused. By using Network I/O Control, you can control the priority of network traffic during contention. This allows you to specify the relative importance of traffic streams and provide bandwidth to particular traffic streams when other traffic competes for bandwidth. 3. Using Load Based Teaming to balance all traffic across uplinks Load based teaming, identified in the user interface as “Route based on physical NIC load” allows for ingress and egress traffic balancing. When consolidating all uplinks in one distributed switch, load based teaming (LBT) distributes the traffic streams across the available uplinks by taking into the utilization into account. Please note: Use Route based on originating virtual port load balancing policy for the two vMotion portgroup, but configure VM network with load based teaming load balancing policy. Route based on originating virtual port load balance policy creates a vNIC to pNIC relation during boot of a virtual machine. That vNIC is “bound” to that pNIC until the pNIC fails or the virtual machine is shutdown. When using a converged network or allowing all network traffic to use each uplink, a virtual machine could experience link saturation or latency due to vMotion using the same uplink. With LBT the virtual machine vNIC can be dynamically bound to a different pNIC with lesser utilization, providing better network performance. LBT monitors the utilization of each uplink and when the utilization is greater than 75 percent for a sustained period of time, LBT moves traffic to other underutilized uplinks. The benefits of a Distributed vSwitch Consistent configuration across hosts saves a lot effort, during configuration and troubleshooting. Consistent configuration is key when providing a stable and a performing environment. Multi-NIC vMotion allows you to use as much bandwidth as possible benefitting DRS in load balance and maintenance mode operations. LBT and Network I/O Control allow other network traffic streams to consume network traffic as much as possible. Load based teaming is a perfect partner to Network I/O Control. LBT attempts to balance out the network utilization across all available uplinks and Network I/O Control dynamically distributes network bandwidth during contention. Back to standard vSwitch when uplink isolation is necessary? Is this Multi-NIC vMotion/NetIOC/LBT configuration applicable to every customer? Unfortunately it isn’t. Converging all network uplinks into a single distributed switch and allowing all portgroups to utilize the uplinks require the VLANs to be available on every uplink. Some customers want to isolate vMotion traffic or other traffic and use dedicated links. For that scenario I would still use a distributed switch and create one for the vMotion configuration. In this particular scenario you do not leverage LBT and Network I/O Control but still benefit from the consistent configuration of distributed portgroups. Part 1 - Designing your vMotion network Part 2 - Multi-NIC vMotion failover order configuration Part 3 – Multi-NIC vMotion and NetIOC Part 4 – Choose link aggregation over Multi-NIC vMotion?