I have network issues and a lot of resources failing in a two-node failover-cluster (2008 R2).
Here are a few observations.
- Only one node can be active in the cluster at the same time. The logs are indicating that they have a problem accessing the witness. The witness is online and available in both nodes (in disk management). However the witness is marked as reserved/offline in the failing node (as well as all other iSCSI disks). Some of the disks are (automatically) assigned a drive letter in the working node.
- I can ping and remote desktop VM:s from outside the cluster but NOT browse (port 80/443) or telnet to port 25 on the virtual exchange server. From other VM:s in the cluster I have full access (smtp, 80, 443 etc). All firewalls are turned off.
- “Validate cluster” tells me that there are a lot of problems with IP-address resources being offline. No other issues are reported.
Some info
- Two-node cluster with disk witness
- All storage are iSCSI
- The cluster also hosts a print server service and a file server service (10.0.0.115 / 10.0.0.118)
- Subnet x is for a set of our virtual servers and for management of the physical nodes
- Subnet y is for a different set of our virtual servers
- Each node has 4 nics:
o One for management 10.0.0.21 / 10.0.0.22 (belonging to subnet x)
o One for virtual adapter x (virtual adapter x is a virtual switch for subnet x)
o One for virtual adapter y (virtual adapter y is a virtual switch for subnet y)
o One for iscsi (10.10.120.20 / 10.10.120.21)
- The cluster itself has IP 10.0.0.10 (temporarily changed to 10.0.0.9)
Here a common errors in the cluster logs:
- Node 'Node-1' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and
available.
Cluster resource 'Quorum' in clustered service or application 'Cluster Group' failed. - Cluster node 'Node-2' was removed from the active failover cluster membership.
- Cluster network interface Node-2 - Local Area Connection 3' for cluster node 'Node-2' on network 'Cluster Network 1' is unreachable by at least one other cluster node attached to the network
IPconfig node-1:
Ipconfig node-2:
How do I narrow this issue down?