Why is vMotion using the management network instead of the vMotion network?

On the community forums I’ve seen some questions about the use of the management network by vMotion operations. The two most common scenario’s are explained, please let me know if you notice this behavior in other scenarios.

Scenario 1: Cross host and non-shared datastore migration
vSphere 5.1 provides the ability to migrate a virtual machine between hosts and non-shared datastores simultaneously. If the virtual machine is stored on a local or non-shared datastore vMotion is using the vMotion network to transfer the data to the destination datastore. When monitoring the VMkernel NICs, some traffic can be seen following over the management NIC instead of the VMkernel NIC enabled for vMotion.

When migrating a virtual machine, vMotion determines hot data and cold data. Virtual disks or snapshots that are actively used are considered hot data, while the cold data are the underlying snapshots and base disk. Let’s use a virtual machine with 5 snapshots as an example. The active data is the recent snapshot, this is sent over across the vMotion network while the base disk and the 4 older snapshots are migrated via a network file copy operation across the first VMkernel NIC (vmk0).

The reason why vMotion uses separate networks is that the vMotion network is reserved for data migration of performance related content. If the vMotion network is used for network file copies of cold data, it could saturate the network with non-performance related content and thereby starving traffic that is dependent on bandwidth. Please remember that everything sent over the vMotion network directly affects performance of the migrating virtual machine.

During a vMotion the VMkernel mirrors the active I/O between the source and the destination host. If vMotion would pump the entire disk hierarchy across the vMotion network it would steal bandwidth from the I/O mirror process and this will hurt the performance of the virtual machine.

If the virtual machine does not contain any snapshots, the VMDK is considered active and it is migrated across the vMotion network. The files in the VMDK directory are copied across the network of the first VMkernel NIC.

Scenario 2: Management network and vMotion network sharing same IP-range/subnet
If the management network (actually the first VMkernel NIC) and the vMotion network share the same subnet (same IP-range) vMotion sends traffic across the network attached to first VMkernel NIC. It does not matter if you create a vMotion network on a different standard switch or distributed switch or assign different NICs to it, vMotion will default to the first VMkernel NIC if same IP-range/subnet is detected.

Please be aware that this behavior is only applicable to traffic that is sent by the source host. The destination host receives incoming vMotion traffic on the vMotion network!

I’ve been conducting an online-poll and more than 95% of the respondents are using a dedicated IP-range for the vMotion traffic. Nevertheless I would like to remind you that it’s recommended to use a separate network for vMotion. The management network is considered to be an unsecure network and therefor vMotion traffic should not be using this network. You might see this behavior in POC environments where you use a single IP-range for virtual infrastructure management traffic.

If the host is configured with a Multi-NIC vMotion configuration using the same subnet as the management network/1st VMkernel NIC, then vMotion respects the vMotion configuration and only sends traffic through the vMotion-enabled VMkernel NICs.

If you have an environment that is using a single IP-range for management network and the vMotion network, I would recommend creating a Multi-NIC vMotion configuration. If you have a limited amount of NICs, you can assign the same NIC to both VMkernel NICs, although you do not leverage the load balancing functionality, you force the VMkernel to use the vMotion-enabled networks exclusively.

Comments

  1. says

    I love reading your blog because I learn things I never really knew (or thought about). Thank you for sharing this knowledge. To clarify – when you say that vmk0 is selected to migrate “cold” data – what if vmk0 is not the management vmkernel? Perhaps it is being used for FT or another purpose. Would the cold data correctly locate the management vmk?

  2. says

    Nice post Frank, thanks. By “The files in the VMDK directory are copied across the network of the first VMkernel NIC” you mean vmx, log, etc…?

  3. says

    Hi Chris,

    Thanks for the compliment!

    When you install ESXi, the first VMKernel NIC (VMK0) is used for the management network. If you switch around, destroy and recreate a couple of VMKs, it can happen that the management network is not VMK0. Therefor I cannot state that it always use the management NIC.

    To state it differently, it’s the VMkernel NIC configured with the default gateway. (UI, not one with a static route configured manually)

  4. André Pett says

    Thanks for this insight, very interesting.
    About: “… and the 4 older snapshots are migrated via a network file copy operation across the first VMkernel NIC (vmk0).”
    Just thinking about this. Doesn’t this introduce a security issue? Although production VM’s shouldn’t run with snapshots (so this may not be a huge business case), the vMotion traffic is unencrypted, which is one of the reasons for using a separate vMotion network!?

  5. André Pett says

    Frank,
    NFC with SSL enabled by default, that explains it. Sometimes I just need some time … ;)
    Thanks.

  6. David Dominguez says

    Nice Article Frank! we experienced that behavior when we upgraded a few years ago from Classic ESX into ESXi. What about having NFS and vmotion on the same subnet? Let’s assume vmotion port is vmk1 and NFS port vmk2. Would the NFS traffic still go through vmk1 ??

    Thanks!

  7. JudgeDredd says

    Interesting article.

    If you are using 10Gb NICs you could end up with vMotion and NFC traffic travelling down the same link, although they might be on different subnets/VLANs. Might it be useful to give NFC traffic its own Network Resource Pool so that NetIORM could be used to prevent it impacting or “normal” management traffic?

  8. AK says

    Agree with JudgeDredd here and will add some other things.
    Even if you have separate complete dedicated networks. (in our case we have 1gbit for management and 10gbit for vMotion), it would not be very optimal to go over the 1gbit for NFC.
    It would be nice to have a choice/tunable to ‘force’ all traffic over the vMotion regardless of ‘hot’/'cold’ data. Especially since some customers may indeed have very robust/dedicated vMotion networks. I think a combination of being able to force ALL vMotion over the defined vMotion network with NetIOC prioritizing hot/cold data would be optimal.

  9. says

    I do not know if it’s just me or if perhaps everybody else experiencing problems with your site. It appears as though some of the text in your posts are running off the screen. Can someone else please comment and let me know if this is happening to them as well? This may be a issue with my web browser because I’ve had this happen before.
    Cheers

  10. says

    I understand from what I have read that we can now do storage vmotion between hosts that do not share any common storage. I am not clear on whether or not we can do that when the hosts are not sharing a common vlan for vmotion or storage? I would assume as long as the vmotion traffic is able to route between the vlans it would still work but that it might not be the best way to approach this. My issue is this, the current environment is basically out of IP addresses to add new hosts and we need to retire the existing hardware and replace it with new hardware.