An all-hardware Exchange 2010 SP3 UR4 DAG cluster is having an issue when the Microsoft Loopback adapter is installed (from Device Manager...Add Legacy Hardware) to support DSR operations with hardware load balancer (HLB).
- The HLB provides HA endpoint for RPC Client Access, SMTP, etc. DSR is required to preserve source IP--on which Exchange receive connectors that filter on source IP for security depend.
- It is server DAG, with 3 x production severs at the datacenter and 2 x DAG DR servers located in a DR site.
- Only the 3 x production servers at the main site have the loopback adapter installed.
- The loopback-DSR-specific settings like 'weakhostrecive, etc' are in effect.
The problem only involves the 3 servers in the DAG with loopback adapters.
The issue is that when a DAG member restarts, sometimes it will cause the online production cluster node which isnot the Cluster Host Server to fail. Consider:
- DAGNode1, Loopback enabled, Healthy, Is Cluster Host Server
- DAGNode2, Loopback enabled, Healthy
- DAGNode3, Loopback enabled, is Restarted
In this scenario, the cluster service on DAGNode2 will experience a loss of network connectivity when DAGNode3 rejoins the cluster (DAGNode2 reports cluster failure on all other nodes) and shortly afterwards the Cluster Service on DAGNode2 will terminate. FailoverClustering 1572 is seen on DAGNode2:
Node 'DAGNode2' failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. Please run the Validate a Configuration wizard to ensure network settings. Also verify the Windows Firewall 'Failover Clusters' rules.
Interestingly, if you disable the Loopback on DAGNode3, DAGNode2 will immediately rejoin the cluster! Re-enable the Loopback on DAGNode3 and DAGNode2 immediately fails again! With some more server restarts possibly, you get a stable cluster again with Loopback enabled on all production nodes. The status of the loopback (enabled or not) on the Cluster Host does not impact this issue.
As I mentioned, it is only some restarts that this occurs, usually there is no problem. Also note the Loopback network/adapters do not appear in Cluster Manager and are not listed as cluster networks with cluster.exe. Cluster Validation Wizard passes everything except noting that every node has a duplicate IP on an installed adapter.
Looking for others with experience that have combined DSR-based HLB with CAS/Hub/MBX DAG Cluster on same Exchange computers and were able to use reliably.
There is an unanswered thread from 2010 on this topic:
Some questions / any answers are very welcome!
- Can I add the Loopback adapter to the cluster configuration so that I can use Cluster.exe to ignore the loopback adapter?
- Can I prevent other cluster nodes from seeing the loopback adapters in the other nodes? Is there an ‘ignore partner adapter’ setting?
John Joyner MVP-SC-CDM