frankdenneman Frank Denneman is the Machine Learning Chief Technologist at VMware. He is an author of the vSphere host and clustering deep dive series, as well as podcast host for the Unexplored Territory podcast. You can follow him on Twitter @frankdenneman

Flex-10 lessons learned

4 min read

One of my clients bought a couple of HP blade c7000 enclosures recently. Including the new dual port Flex-10 mezzanine cards (nc532m) and Flex-10 Virtual Connect modules. Due to the fact that this technology is quite new, not much inside-info is found on the web. I’ve had lots of discussions with Ken Cline and Scott Lowe, which will publish an Flex-10 article by it’s own pretty soon. This write-up is a quick overview of lessons learned by me but even
more a call for answers.

My client purchased the Flex-10 technology to use for iSCSI network traffic. Two uplinks are connected to the storage network. I’m aware about the iSCSI limitations in ESX, but this write-up will not contain info about the software iSCSI initiator. Some excellent articles are written about the behavior of the software iSCSI initiator and the ESX network stack, which I encourage you to read:
• The legendary ISCSI multi-vendor post article of some major players;
http://virtualgeek.typepad.com/virtual_geek/2009/01/a-multivendor-post-to-help-our-mutual-iscsi-customers-using-vmware.html#more
• Ken Cline and his great vSwitch debate series;
http://kensvirtualreality.wordpress.com/2009/04/05/the-great-vswitch-debate%e2%80%93part-3/
• And Scott Lowe’s; Understanding NIC Utilization in VMware ESX
http://blog.scottlowe.org/2008/07/16/understanding-nic-utilization-in-vmware-esx/
When configuring HP blades, there are two areas to pay attention to. The uplink configuration and the server profile configuration. Uplinks can be used multiple blades in the enclosure, configuration of Blade Nics to use the uplinks are done in the server profile .
Uplink
Due to my clients’ redundancy requirements, two separate external switches are being used for iSCSI traffic. A simple point to point topology is being used and the Flex-10 modules are connected via one uplink port to a separate switch. The two uplink ports are placed into one “Shared Uplink Set” and assigned to a virtual Ethernet in virtual connect manager.
In theory two connections of 10GB each will become available to use for iscsi traffic, but due to use of two separate switches and the loop prevent mechanism of virtual connect, one uplink port of the shared uplink set will be utilized as the active port. The other port is placed in
Standby (blocked) mode.
Loop prevent mechanism
The connection mode of a shared uplink set determines the assignment of the uplinks. When multiple uplinks are assigned to a Shared Uplink Set, the default connection mode (auto) attempts to negotiate a port channel using LACP. If the LACP negotiation fails, auto connection mode places all uplink ports, except one, into standby (blocked) mode.
The currently used physical switches at my clients’ environment do not support spanned Ethernet channel and therefore the LACP negotiation fails. As a result, one uplink port is assigned the active role, the uplink of the second virtual connect module is placed in the standby (blocked) mode. As a result of this, the useable bandwidth is reduced to 1 x 10 GB.
Besides bandwidth reduction, the assignments of active and standby uplinks have impact on the configuration of the vswitch configuration.
Flex-10
The behavior of a Flex-10 nic depends on which virtual connect module it’s connected to.
When a Flex-10 nic is directly connected to a virtual connect Flex-10 module it enumerates as four Flexnics per port. When connected to other virtual connect modules, the Flex-10 nic will only enumerate 2 Flexnics, one Flexnic per port. The funny thing is that when a Flex-10 mezzanine card is “connected” to two empty interconnect bays, the Flex-10 nic will also enumerate eight Flexnics, because HP expects that the Flex-10 mezzanine card will eventually be connected to a Flex-10 virtual connect module. As a result of the internal port mapping, the dualport Flex-10 mezzanine card in my client blade servers are connected to the Flex-10 virtual connect modules. Therefore enumerating a total of eight Flexnics, but what exactly will be presented to the ESX host?
Flexnic
The ESX host will see a Flexnic as a device with an unique PCI Device ID that is connected to a 10 GB port. Due to this unique ID, it appears in ESX as separate nic with its own Broadcom 57711 driver instance. When mapping Ethernet networks to a Flex-10 nic inside virtual connect manager, eight Flexnics are being presented to the ESX server.
Server profile
Inside the server profile in virtual connect manager, nics can be mapped to Ethernet networks As a result of the enumeration of multiple Flexnics, the process of mapping Ethernet networks to Flexnics differs from dual port mezzanine- of onboard 1GB nics.
The Flex10 port presents the Flexnics as four sub-devices, Flex10 port 1 presents the Flexnics as:
1a
1b
1c
1d
Flex-10 port 2 presents the Flexnics as;
2a
2b
2c
2d
When assigning an Ethernet network to a Flex-10 nic, it will alternate between the two Flex-10 ports. It will start by displaying the first Flexnic of the first Flex-10 port (1-a), then the first Flexnic of the second Flex-10 port (2-a) and so on. It is possible to map the same Ethernet network to two Flexnics of separate Flex-10 ports, for example, Ethernet network iscsi – 1a & 2 a, but it isn’t possible to map a single Ethernet network to multiple Flexnics on the same Flexnic port, Ethernet network iscsi – 1a & 1b.
As a result of using one VLAN network for iSCSI traffic, only the first Flexnic of each Flex-10 port (1a & 2a) is mapped to the iSCSI Ethernet network.
Because the Flex10 technique is used only for iSCSI traffic, the other Flexnic adapters are not mapped to a Ethernet network. Because of this, the unmapped Flexnic will appear as unconnected uplinks in ESX.
Bandwidth throttle
It is possible to throttle the bandwidth per Flexnic. Because one Flexnic per Flex10 port is being used, it will be configured with the maximum bandwidth.
vSwitch
The order in which the NICs are mapped to Ethernet networks in the server profile of the Virtual Connect manager determines the assignment of vmnic labels. Flexnic 1a is assigned the label vmnic4 and Flexnic 2a is assigned label vmnic5. Both uplinks have the UP status.
Due to the internal port mapping of virtual connect, Flexnic 2a is mapped to the second Flex-10 VC module which uplink is assigned the standby (blocked) status of the Shared Uplink Set.
This situation raised some questions, which I haven’t found any answers to (yet).
What will happen if the VMkernel decides to use that nic to send IO?
Is the Flexnic aware of the standby status of it “native” uplink? Will it send data to the uplink of the VC module it’s connected to or will it send data to the active uplink?
How is this done? Will it send the IO through the midplane or CX-4 cable to the VC module with the active uplink? And if this occurs what will be the added latency of this behavior?
HP describes the standby status as blocked, what does this mean? Will virtual connect discard IO send to the standby IO, will it not accept IO and how will it indicate this?
The described situation can have impact on the vSwitch design. Just to be on the safe side of things, the 2a Flexnic is configured as standby adapter in the iSCSI vSwitch.
Must read documents about Virtual Connect and Flex-10 technology:
HP Virtual Connect for Cisco Network Administrators (c01386629.pdf)

HP Virtual Connect for c-Class BladeSystem User Guide (c00865618.pdf)

HP Virtual Connect Cookbook (c01471917.pdf)

frankdenneman Frank Denneman is the Machine Learning Chief Technologist at VMware. He is an author of the vSphere host and clustering deep dive series, as well as podcast host for the Unexplored Territory podcast. You can follow him on Twitter @frankdenneman

13 Replies to “Flex-10 lessons learned”

  1. Thx I’m aware of using multiple SUS’s to create active active uplinks. Due to the configuration of the physical switches, multiple SUS for the use of one VLAN is not accepted by the network team.
    It seems that the link sections did not appear in the orginal post. I’ve added the links, including the cookbook.

  2. Thx I’m aware of using multiple SUS’s to create active active uplinks. Due to the configuration of the physical switches, multiple SUS for the use of one VLAN is not accepted by the network team.
    It seems that the link sections did not appear in the orginal post. I’ve added the links, including the cookbook.

  3. Thx I’m aware of using multiple SUS’s to create active active uplinks. Due to the configuration of the physical switches, multiple SUS for the use of one VLAN is not accepted by the network team.
    It seems that the link sections did not appear in the orginal post. I’ve added the links, including the cookbook.

  4. Thx I’m aware of using multiple SUS’s to create active active uplinks. Due to the configuration of the physical switches, multiple SUS for the use of one VLAN is not accepted by the network team.
    It seems that the link sections did not appear in the orginal post. I’ve added the links, including the cookbook.

  5. I’m not really sure , I have no experience yet , but I have read through the cookbook and as I understand if you use the “smartlink” feature in VC it will disconnect the flexnics when the vc uplink goes down. If it already is in a “blocked” state , wont the flexnics already be disconnected then ?
    Or if it is not disconnected I am thinking of the “Beacon Probe” setting in the vSwitch ? If you enable the “Beacon Probe” setting in the vSwitch it sounds to me that it will correctly understand which flexnic to use…? Anyone with experience on this ?

  6. Well you can have active/passive without any trouble as long the 2 virtual connect are put on the same line on C3000 or C7000. In fact each virtual connect on the same line are linked together by the C7000 or C3000 internal switch.
    This means that if the packet is sent to the passive one, it will just go through the internal switch of the C7000/C3000 to the active virtual connect (the one that has the active uplink).
    So esx can either loabalance on all the physical interfaces configured for the vmkernel or virtual switch, all packets will go finaly to the active uplink without any trouble.
    When your vitual connect switches are not on the same line, then you have dedicated 10 Gb links from HP that can be purchased to chain them and have the same result.

  7. We have the FC Flex10 VCs, but cannot get them to work at all. They are connected to 1GB ports on a Cisco 3560E switch. No link light. We cannot find any documentation on the Flex10 and HP support hardly knows anything about them.

  8. Frank, great post, many thanks. I’m glad I use Cisco UCS because this looks really difficult. With UCS it really is plug and play. Out of the box and up and running in 90 mins. HP VC looks like a nightmare and network admins don’t like it, for reasons that are obvious in this post. Urgh! I think VC was invented by a sadist!

  9. Hi Frank!
    I can provide some insight here:
    What will happen if the VMkernel decides to use that nic to send IO?
    All VC modules within the same domain should be connected via either the built-in midplane connection or via the CX4 interconnect cables. A VC domain can span up to 16 VC devices (interconnect modules), so it is possible to have multiple enclosures within a single VC domain, all sharing the same SUS.
    The VMkernel can ship packets to any vmnic without worrying whether there is an uplink path – as long as one path of the SUS is available, you’ll have joy.
    Is the Flexnic aware of the standby status of it “native” uplink?
    The FlexNIC doesn’t care about the status of any of the uplinks. It’s the job of the Flex-10 interconnect module to direct traffic to the correct (available) uplink.
    Will it send data to the uplink of the VC module it’s connected to or will it send data to the active uplink?
    A FlexNIC will always send data to its connected VC module. It’s then up to the VC module to figure out what to do with it.
    How is this done? Will it send the IO through the midplane or CX-4 cable to the VC module with the active uplink?
    The FlexNIC sends traffic to its hard-wired VC interconnect module. From there, it may be redirected through one (or more) additional VC interconnects to reach an available active uplink.
    And if this occurs what will be the added latency of this behavior?
    Not 100% sure, but logic would indicate that, since the data is being passed through multiple devices, there is some additional latency, even if it is minimal.
    HP describes the standby status as blocked, what does this mean?
    Inbound traffic (from the pSwitch) will be accepted; however, the assumption is being made that all uplinks within a SUS are part of the same LACP domain. This means that it is perfectly happy (and within its rights) to send response frames back on any uplink port within the SUS.
    Will virtual connect discard IO send to the standby IO, will it not accept IO and how will it indicate this?
    Since the VC domain is responsible for directing traffic out an available active uplink port, the only way that I/O could be lost is if all uplinks are down.
    An additional note on Smart Link…the last project I worked on with VC (Jan 2010), HP strongly recommended that we NOT use Smart Link with ESX. Much better to let the VC fabric figure out what to do with the traffic internally rather than having ESX try to interpret upstream status.
    Hope this helps, and thanks for the PingBack!
    KLC

  10. Hi Ken,
    I need to have a question answered to be sure I understand this correctly.
    If you have a etherchannel trunk of 2 ports (1Gb port each) on the pSwtich and plugged the cables like this to the modules.
    One cable in enc1,bay1,x1(active).
    One cable in enc1,bay2,x1(standby).
    Then you’re able to receive a total throughput of 2Gb inbound traffic and VC will then pass traffic to a given bay(server). But since Outbound traffic will be redirected and send through an active uplink, the total throughput would be 1Gb for Outbound traffic since you only have one active uplink (bay1,x1).
    I guess I am totally wrong but I hope you can help me understand.

Comments are closed.