Hi,
We have a two node Windows Server 2008 R2 Cluster, which is experiencing issues on failover. Everything is currently running on Node 2, however we are no longer able to failover to Node 1. In the logs I see the following messages:
2014/01/27-23:24:29.840 INFO [RES] Network Name <SQLMASTER>: DNS name SQLMASTER.here.net Registration with LSA was successful
2014/01/27-23:25:29.000 ERR [RHS] RhsCall::DeadlockMonitor: Call ONLINERESOURCE timed out for resource 'SQLMASTER'.
2014/01/27-23:25:29.000 ERR [RHS] Resource SQLMASTER handling deadlock. Cleaning current operation.
2014/01/27-23:25:29.000 WARN [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQLMASTER', gen(0) result 5018.
2014/01/27-23:25:29.000 INFO [RCM] TransitionToState(SQLMASTER) OnlinePending-->ProcessingFailure.
2014/01/27-23:25:29.000 ERR [RCM] rcm::RcmResource::HandleFailure: (SQLMASTER)
2014/01/27-23:25:29.000 INFO [RCM] resource SQLMASTER: failure count: 1, restartAction: 2.
The cluster then attempts to restart the resource, brings the network name online and encounters the following errors:
2014/01/27-23:26:26.445 WARN [RES] Network Name <SQLMASTER>: WaitForTargetToComeUp: WSA_QOS_ADMISSION_FAILURE(11010)' because of '[cxl::Pinger-"SQLMASTER"] Could not send IPv4 echo.'
2014/01/27-23:26:26.445 WARN [RES] Network Name <SQLMASTER>: WaitForTargetToComeUp: WSA_QOS_ADMISSION_FAILURE(11010)' because of '[cxl::Pinger-"SQLMASTER"] Could not send IPv4 echo.'
2014/01/27-23:26:29.048 INFO [RES] Network Name <SQLMASTER>: [cxl::Pinger-"SQLMASTER"] Host registered, but no records of type 23
2014/01/27-23:26:29.048 INFO [RES] Network Name <SQLMASTER>: [cxl::Pinger-"SQLMASTER"] Host registered, but no records of type 23
2014/01/27-23:26:29.048 WARN [RES] Network Name <SQLMASTER>: [cxl::Pinger-"SQLMASTER"] Could not find any endpoints for remote target
2014/01/27-23:26:29.048 WARN [RES] Network Name <SQLMASTER>: [cxl::Pinger-"SQLMASTER"] Could not find any endpoints for remote target
2014/01/27-23:26:29.049 INFO [RES] Network Name <SQLMASTER>: Setting resource specific message to <Name Resolution Not Yet Available>.
Finally the cluster attempts to start SQL Server, which fails, and the cluster fails back to Node 2 successfully.
The cluster was previously running successfully on Node 1, and to my knowledge there have been no changes to the server or cluster configuration.
Thank you in advance for any suggestions you can provide.
-Daniel