Flex-10 lessons learned
One of my clients bought a couple of HP blade c7000 enclosures recently. Including the new dual port Flex-10 mezzanine cards (nc532m) and Flex-10 Virtual Connect modules. Due to the fact that this technology is quite new, not much inside-info is found on the web. I’ve had lots of discussions with Ken Cline and Scott Lowe, which will publish an Flex-10 article by it’s own pretty soon. This write-up is a quick overview of lessons learned by me but even
more a call for answers.
My client purchased the Flex-10 technology to use for iSCSI network traffic. Two uplinks are connected to the storage network. I’m aware about the iSCSI limitations in ESX, but this write-up will not contain info about the software iSCSI initiator. Some excellent articles are written about the behavior of the software iSCSI initiator and the ESX network stack, which I encourage you to read:
• The legendary ISCSI multi-vendor post article of some major players;
• Ken Cline and his great vSwitch debate series;
• And Scott Lowe’s; Understanding NIC Utilization in VMware ESX
When configuring HP blades, there are two areas to pay attention to. The uplink configuration and the server profile configuration. Uplinks can be used multiple blades in the enclosure, configuration of Blade Nics to use the uplinks are done in the server profile .
Due to my clients’ redundancy requirements, two separate external switches are being used for iSCSI traffic. A simple point to point topology is being used and the Flex-10 modules are connected via one uplink port to a separate switch. The two uplink ports are placed into one “Shared Uplink Set” and assigned to a virtual Ethernet in virtual connect manager.
In theory two connections of 10GB each will become available to use for iscsi traffic, but due to use of two separate switches and the loop prevent mechanism of virtual connect, one uplink port of the shared uplink set will be utilized as the active port. The other port is placed in
Standby (blocked) mode.
Loop prevent mechanism
The connection mode of a shared uplink set determines the assignment of the uplinks. When multiple uplinks are assigned to a Shared Uplink Set, the default connection mode (auto) attempts to negotiate a port channel using LACP. If the LACP negotiation fails, auto connection mode places all uplink ports, except one, into standby (blocked) mode.
The currently used physical switches at my clients’ environment do not support spanned Ethernet channel and therefore the LACP negotiation fails. As a result, one uplink port is assigned the active role, the uplink of the second virtual connect module is placed in the standby (blocked) mode. As a result of this, the useable bandwidth is reduced to 1 x 10 GB.
Besides bandwidth reduction, the assignments of active and standby uplinks have impact on the configuration of the vswitch configuration.
The behavior of a Flex-10 nic depends on which virtual connect module it’s connected to.
When a Flex-10 nic is directly connected to a virtual connect Flex-10 module it enumerates as four Flexnics per port. When connected to other virtual connect modules, the Flex-10 nic will only enumerate 2 Flexnics, one Flexnic per port. The funny thing is that when a Flex-10 mezzanine card is “connected” to two empty interconnect bays, the Flex-10 nic will also enumerate eight Flexnics, because HP expects that the Flex-10 mezzanine card will eventually be connected to a Flex-10 virtual connect module. As a result of the internal port mapping, the dualport Flex-10 mezzanine card in my client blade servers are connected to the Flex-10 virtual connect modules. Therefore enumerating a total of eight Flexnics, but what exactly will be presented to the ESX host?
The ESX host will see a Flexnic as a device with an unique PCI Device ID that is connected to a 10 GB port. Due to this unique ID, it appears in ESX as separate nic with its own Broadcom 57711 driver instance. When mapping Ethernet networks to a Flex-10 nic inside virtual connect manager, eight Flexnics are being presented to the ESX server.
Inside the server profile in virtual connect manager, nics can be mapped to Ethernet networks As a result of the enumeration of multiple Flexnics, the process of mapping Ethernet networks to Flexnics differs from dual port mezzanine- of onboard 1GB nics.
The Flex10 port presents the Flexnics as four sub-devices, Flex10 port 1 presents the Flexnics as:
Flex-10 port 2 presents the Flexnics as;
When assigning an Ethernet network to a Flex-10 nic, it will alternate between the two Flex-10 ports. It will start by displaying the first Flexnic of the first Flex-10 port (1-a), then the first Flexnic of the second Flex-10 port (2-a) and so on. It is possible to map the same Ethernet network to two Flexnics of separate Flex-10 ports, for example, Ethernet network iscsi – 1a & 2 a, but it isn’t possible to map a single Ethernet network to multiple Flexnics on the same Flexnic port, Ethernet network iscsi – 1a & 1b.
As a result of using one VLAN network for iSCSI traffic, only the first Flexnic of each Flex-10 port (1a & 2a) is mapped to the iSCSI Ethernet network.
Because the Flex10 technique is used only for iSCSI traffic, the other Flexnic adapters are not mapped to a Ethernet network. Because of this, the unmapped Flexnic will appear as unconnected uplinks in ESX.
It is possible to throttle the bandwidth per Flexnic. Because one Flexnic per Flex10 port is being used, it will be configured with the maximum bandwidth.
The order in which the NICs are mapped to Ethernet networks in the server profile of the Virtual Connect manager determines the assignment of vmnic labels. Flexnic 1a is assigned the label vmnic4 and Flexnic 2a is assigned label vmnic5. Both uplinks have the UP status.
Due to the internal port mapping of virtual connect, Flexnic 2a is mapped to the second Flex-10 VC module which uplink is assigned the standby (blocked) status of the Shared Uplink Set.
This situation raised some questions, which I haven’t found any answers to (yet).
What will happen if the VMkernel decides to use that nic to send IO?
Is the Flexnic aware of the standby status of it “native” uplink? Will it send data to the uplink of the VC module it’s connected to or will it send data to the active uplink?
How is this done? Will it send the IO through the midplane or CX-4 cable to the VC module with the active uplink? And if this occurs what will be the added latency of this behavior?
HP describes the standby status as blocked, what does this mean? Will virtual connect discard IO send to the standby IO, will it not accept IO and how will it indicate this?
The described situation can have impact on the vSwitch design. Just to be on the safe side of things, the 2a Flexnic is configured as standby adapter in the iSCSI vSwitch.
Must read documents about Virtual Connect and Flex-10 technology: