Hello TechNet friends,
I have a scenario that happened yesterday that leaves me stumped and I am not sure in which direction to go.
- 2-node active/passive 2008R2 file cluster (Node 1 & Node 2)
- Nodes are vmguests on vsphere 5.5
- path selection is round-robin
- quorum node/disk majority (quorum disk is SAN...all drives are SAN in fact)
- Node 1 owns cluster resources
Our VM environment re-balanced itself in the wee hours of the morning. Upon initiation of migration of Node 1 to a different host, the VM system reported that there was no heartbeat coming from node 1. This appears to be because the virtual switch used in VMware listed a different "Observable IP range" outside that of the heartbeat IP. We have noticed that the observable IP range change and apparently that is expected behavior due to broadcast packets being received and should not cause alarm. The guest migration then occurred.
Seconds later, the MS cluster reported the cluster service failed to update the cluster configuration on the witness disk. The witness disk then failed and dropped from the cluster. The cluster remained up with no errors being reported.
The newly migrated Node 1 showed all green in terms of cluster and cluster resources and only showed this Witness disk error in the logs. It wasn't until I was notified that the application could not reach its cluster resources did I drill down into the cluster and notice that the attached SAN drives only showed a unique Identifier # and no longer had a drive letter. The drive also showed 0 bytes. I had to reboot Node 1 in order to restore connectivity.
So..I think I have a couple of questions:
A) Did the intermittent loss of a heartbeat during the migration cause the cluster service to fail to update the cluster config on the witness disk?
B) Why does A matter if the original cluster config is kept c:\windows\cluster?
c) You can lose the witness disk and be ok, why did Node 1 all of a sudden think it had the cluster resources but could not provide a drive letter?
thank you.