Hyper-V host VMs unable to communicate with VMs on other clustered host?

Jon Hall 0 Reputation points
2025-07-31T21:06:44.3333333+00:00

Hey all - I'm hitting a wall with this one and was hoping someone might spot something obvious I'm missing. Sorry if my description gets a bit dizzying...

I have 2 Server 2022 hosts in a failover cluster, each with similar networking:

  • 2x 40Gbps links for guest VM traffic (1 to each of 2 Aruba AOS-CX switches with MC-LAG configured between the switches, though no LAG/LACP configured on these particular VM interfaces), configured in single virtual switch with SET (port-based balancing).
  • 4x 25Gbps links to storage, via 2 separate storage VLANs via 2 independent switches
  • 2x 1Gbps links (OS-teamed) for management traffic
  • 2x 1Gbps links (OS-teamed) for migration traffic
  • Each host has a unique range for assignable hardware/MAC addresses, and I've even gone so far in troubleshooting this issue as to assign unique static MACs to each VM (did not resolve issue).

Hosts, switches, and clients elsewhere in the network all have no issues communicating with each other. The issue I find is with traffic specifically from guest VMs on one host to the other.

Host A's VMs have zero issues whatsoever pinging/communicating with VMs on Host B. They also have no issues communicating with anything else on the network.

Host B's VMs can't seem to ping Host A's VMs, UNLESS I connect to the destination VM and start a ping back... then suddenly the B VM's pings will begin getting responses, and the host B VM then has no issues pinging the Host A VM until it restarts... then the problem resumes.

Even if I ping from the Host A-side VM so that the B VM will be able to start getting replies to its own ping requests, it continues to have issues pinging any other VM on the host A-side unless I repeat the same process from every host A-side VM (with a couple dozen VMs, this is obviously a pain to deal with).

The other solution I've found is that if I live migrate the VM from Host B to Host A, then it has no problem communicating with the other existing VMs on Host A, and will continue to be able to ping them if I live migrate the VM back to host B. It will continue to be able to communicate with them until it is restarted... then the problem comes back.

I've set up Wireshark to monitor both the interfaces on the Aruba switches, as well as on the Host B server itself monitoring the interfaces in the virtual switch team - when I start a ping from a host B VM to a host A one, none of the wireshark captures detect any of the ping traffic whatsoever... that is, UNTIL I connect to the destination VM on the Host A side and start a ping back to the Host B VM... then the Host B VM starts getting its replies, and all ping traffic both ways instantly begins appearing in the Wireshark captures.

At no time do the host B VMs have any issue pinging any other devices on our network, nor does it have any issue communicating with Host A's mgmt address... they only have issues with Host A-hosted VMs.

I've verified all VLANs and virtual switch settings are consistent across the switches and both hosts' configurations.

Is there anything obvious I'm overlooking that might explain this sort of behavior? Any suggestions?

Thank you!

Windows for business | Windows Server | Storage high availability | Virtualization and Hyper-V
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Joseph Tran 1,400 Reputation points Independent Advisor
    2025-08-01T10:38:01.4966667+00:00

    Based on your information, I feel like this is a classic and frustrating issue that smells heavily of MAC address learning / ARP cache / unidirectional traffic flow problems in a complex network fabric with SET, MC-LAG, and no LACP.

    So can you try these recommended solutions bellow and let me know the result ?

    1. Force LACP on the Hyper-V Hosts (if supported)

    • SET does not support LACP officially (it uses switch-independent or switch-dependent modes).
    • If you're using switch-independent SET, then MC-LAG on the switch side can be problematic because it expects consistent hashing/LACP from the host side.
    • Fix: Either disable MC-LAG for these specific VM switch ports, or reconfigure the SET team as static (no LACP) and ensure Aruba is treating those ports as individual access ports (not part of any LAG group).

    2. Check/Shorten MAC Aging Timer on Switches

    • Aruba switches may have a long aging timeout, meaning they don’t relearn MACs fast enough or persist incorrect mappings.
    • Reduce MAC address aging time (e.g., from 5 minutes down to 60 seconds) temporarily to test.

    3. Test With One Host Using Only One Uplink

    • Temporarily disable one 40Gb uplink on Host B — force all VM traffic over one interface.
    • If the issue disappears, you’ve isolated it to SET hashing → MC-LAG MAC-learning problem.

    4. Change SET Load Balancing Algorithm

    • Try switching from Hyper-V Port to Dynamic or vice versa.
    Get-VMSwitch "YourSwitch" | Set-VMSwitchTeam -LoadBalancingAlgorithm Dynamic
    

    5. Confirm MAC Address Table Learning on Aruba Switches

    • Check MAC tables on both switches (show mac-address-table or equivalent).
    • Confirm that the VM MAC from Host B is being learned on the correct port after initiating a ping from Host A.

    6. Enable MAC Notification / Port Fast Learning

    • Some switches support features like fast MAC learning or MAC notification which can help in dynamic environments like Hyper-V clusters.
    0 comments No comments

  2. Yash Smith 0 Reputation points
    2025-08-01T11:44:14.6366667+00:00

    If VMs on one Hyper-V clustered host cannot communicate with VMs on another, the issue is typically due to network misconfiguration. Ensure that all Hyper-V hosts use identical virtual switch names and types, and verify that NIC teaming or Switch Embedded Teaming (SET) is correctly set up across the nodes. VLAN tagging must be consistent on all connected physical switches, and MAC address spoofing should be enabled for certain features like load balancers or clustered services.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.