The hardware:
We have an 8 node cluster hyper-v 2012r2, All servers are the same, Dell PowerEdge 630
with 4 integrated NIC's 2x10 GB and 2x1GB (BCM57800).
They have also a converged network adapter(BCM57810) with 2x10GB ports and we use NPAR to devide it in 8 Nic's
So the Nic’s are configured like this:
Two on-board nic’s 1 GB are disabled
Two on-board nic’s 10 GB are in a team and used for the Host (NODE_NIC1 + NODE_NIC2 = NODE_TEAM)
Teaming mode LACP, Dynamic, all active, vlan default
VM_TEAM
Two NPAR nic’s (shared) 10 GB are in a team for VM_lan (VM_NIC1 + vm_NIC2 = vm_TEAM)
Teaming mode Switch independent, Dynamic, all active, vlan default
X_TEAM
Two NPAR nics (shared) 10 GB are in a team for VM_lan (X_NIC1 + X_NIC2 = X_TEAM)
Teaming mode Switch independent, Dynamic, all active, vlan default
Y_TEAM
Two NPAR nics (shared) 10 GB are in a team for VM_lan (y_NIC1 + y_NIC2 = y_TEAM)
Teaming mode Switch independent, Dynamic, all active, vlan default
One NPAR nics (shared) 10 GB -> iscsi_nic1 no teaming -> only virtual switch
One NPAR nics (shared) 10 GB -> iscsi_nic2 no teaming -> only virtual switch
The problem:
Virtual machines are running fine and suddenly they lose connection, it happens with all the teams except the Node_team.
We can reproduce it by moving the VM's around the cluster. If we disable or standby one adapter in the team, the problem is gone.
We have read a lot about VMQ, and we believe also that we hit something like that, but we cannot create the perfect solution.
And we would like to use the above configuration, as a workaround we use a active/standby nic in the team.
Any suggestions?
Regards Perry