Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 5654

WS2K8 R2 Cluster does not detect Generic Service failure

$
0
0

We have a service set up as a Generic Service cluster resource named QTrans-BPPLog. We have the resource set up to be restarted automatically in case of failure.

What's happening is that when this service sometimes fails or crashes, the cluster is unaware of the fact that the service is down and doesn't restart it. If I go to the services.msc applet, I can see that the service is not running. The service process is gone in task manager. However, the cluster administrator still shows the service as online. To get it to restart, I have to bring the resource offline then online again. Can someone help?

Here is an excerpt of the cluster log from one of the times I brought it online and it crashed right away but the cluster doesn't see it. Note that there is another resource that is failed in this group but there are no dependencies between that resource and QTrans-BPPLog/

00000d14.00001ea8::2015/06/24-15:26:23.248 INFO  [NM] Received request from client address NCSMCDWTST02.

00000d14.00002134::2015/06/24-15:31:23.131 INFO  [NM] Received request from client address NCSMCDWTST02.

---- I am bringing offline QTrans-BPPLOG, which is not really running but the cluster thinks it's online because it didn't detect the previous failure
00000d14.00002134::2015/06/24-15:31:34.706 INFO  [RCM] rcm::RcmApi::OfflineResource: (QTrans-BPPLog)
00000d14.00002134::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) Online-->OfflineCallIssued.
00000d14.00002134::2015/06/24-15:31:34.862 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Failed --> Pending)
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'QTrans-BPPLog', gen(2) result 997.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflineCallIssued-->OfflinePending.
00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RES] Generic Service <QTrans-BPPLog>: Service died or not active any more; status = 1062.
---- Now the cluster realized that the service was down, but only when I brought it offline

00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RES] Generic Service <QTrans-BPPLog>: Service is now offline.
00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RHS] Resource QTrans-BPPLog has come offline. RHS is about to report resource status to RCM.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'QTrans-BPPLog', gen(2) result 0.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflinePending-->OfflineSavingCheckpoints.
00000d14.000008ac::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflineSavingCheckpoints-->Offline.
00000d14.000008ac::2015/06/24-15:31:34.862 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Pending --> Failed)

---- bringing QTrnas-BPPLog back online...
00000d14.00002134::2015/06/24-15:31:38.139 INFO  [RCM] rcm::RcmApi::OnlineResource: (QTrans-BPPLog)
00000d14.00002134::2015/06/24-15:31:38.201 INFO  [RCM] TransitionToState(QTrans-BPPLog) Offline-->OnlineCallIssued.
00000d14.00002134::2015/06/24-15:31:38.201 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Failed --> Pending)
00000d14.00001e80::2015/06/24-15:31:38.217 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'QTrans-BPPLog', gen(2) result 997.
00000d14.00001e80::2015/06/24-15:31:38.217 INFO  [RCM] TransitionToState(QTrans-BPPLog) OnlineCallIssued-->OnlinePending.
00000f20.00002334::2015/06/24-15:31:39.745 INFO  [RES] Generic Service <QTrans-BPPLog>: Service is now running.
00000f20.00002334::2015/06/24-15:31:39.745 INFO  [RHS] Resource QTrans-BPPLog has come online. RHS is about to report status change to RCM
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'QTrans-BPPLog', gen(2) result 0.
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] TransitionToState(QTrans-BPPLog) OnlinePending-->Online.
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Pending --> Failed)
---- QTrans-BPPLOG crashed at 15:31:48, but the cluster doesn't see the failure

00000d14.00002520::2015/06/24-15:34:14.047 INFO  [NM] Received request from client address NCSMCDWTST02.


Viewing all articles
Browse latest Browse all 5654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>