We are currently having a problem with our high availability environment whereby the cluster loses quorum and all the VMs reboot when the CISCO master is power cycled. The relevant components in the environment are shown in the diagram below:
Server A and Server B are both Microsoft Server 2012 R2. Also, not shown is the servers are connected to a redundant SANS using MPIO.
The CISCO SG500Xs have the latest firmware version 1.4.2.4
The LAG/TEAMs have the following parameters set:
CISCO SG500X LAG parameters from GUI:
- Load Balance Algorithm: IP/MAC Address
- Port Priority = 1
- LACP Timeout = Long
- Administrative Auto Negotiation: Enable
- Administrative Flow Control: Disable
- LACP: Enable
Windows Server 2012 R2 Hyper-V Cluster Team properties:
- Teaming mode: LACP
- Load balancing mode: Dynamic
- Standby adapter: None
The problem is when the CISCO master is power cycled the cluster shuts down and all the VMs reboot. This is a major problem. The event log on the servers show the appropriate NIC that is connected to the master as going down which it should. However, it also shows that the TEAM is no longer operational. This causes the cluster to lose connectivity to the other nodes in the cluster and therefore the cluster shuts down and all the VMs reboot.
My understanding of LAG/TEAMS is that as long as 1 member of the LAG/TEAM is operational it should keep working. That’s the point of high availability.
So I could use any help or comments as to what I might have configured incorrectly.