Hi Team,
We have WSFC between Primary and DR sites. 2 nodes in primary and 1 node in DR. Unfortunately, node 3 goes down very frequently and we are not able to detect the root cause. Even if primary nodes are up , node 3 never rejoins by itself. We have to evict and then rejoin it to the cluster. We tried to do test to see the behavior for node 3. We disabled the NICs on node 3 for 5 min and bring it back, it connects and rejoins the cluster by itself. As a 2nd test, we again disabled the NIC and kept the node 3 in DR down for 45 min. Now if we resume the network (NIC), node 3 still remains in 'down' status.
Is there any threshold after which down node actually stops trying other nodes in the cluster? We tried starting cluster services on node 3 manually , even then also it never comes online.
Cluster has SQL AG installed but i am not sure if SQL could bring down the node and hold it not to come online..
Any pointers will be appreciated
Regards,