Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 5654

Cluster File Share Give some problem

$
0
0

Occasionally a Cluster File Share is stopped and one of the nodes is restarted.
Curiously always it happens around the same time, in conjunction with a backup (Data Protector).
We find errors in the Windows log:

6.51
A component on the server did not respond in a timely fashion. This caused the cluster resource 'DATA01' (resource type 'Physical Disk', DLL 'clusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly

6.52
A component on the server did not respond in a timely fashion. This caused the cluster resource 'DATA01' (resource type 'Physical Disk', DLL 'clusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly

7.23
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000009e (0xffffe001bd309600, 0x00000000000004b0, 0x0000000000000005, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 022216-32593-01.

Cluster Log:
016/02/22-05:51:34.750 ERR   [RHS] RhsCall::DeadlockMonitor: Call ISALIVE timed out by 16 milliseconds for resource 'DATA01'.
00000c98.00000cb4::2016/02/22-05:51:34.750 INFO  [RHS] Enabling RHS termination watchdog with timeout 1200000 and recovery action 3 from source 5.
00000c98.00000cb4::2016/02/22-05:51:34.750 ERR   [RHS] Resource DATA01 handling deadlock. Cleaning current operation and terminating RHS process.
00000c98.00000cb4::2016/02/22-05:51:34.750 ERR   [RHS] About to send WER report.
0000048c.000017b0::2016/02/22-05:51:34.781 WARN  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'DATA01', gen(0) result 4/0.
0000048c.000017b0::2016/02/22-05:51:34.812 INFO  [RCM] rcm::RcmResource::HandleMonitorReply: Resource 'DATA01' consecutive failure count 1.
00000c64.000018f8::2016/02/22-05:51:35.062 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:08caf7d8-f3c7-4680-9624-d7eda395f231:Netbios
00000c98.00000cb4::2016/02/22-05:51:36.625 ERR   [RHS] WER report is submitted. Result : WerReportQueued.


0000048c.0000159c::2016/02/22-05:52:31.325 INFO  rcm::RcmMonitor::WaitForRhsToInitializercm::RcmMonitor::WaitForRhsToInitialize Process pid 0x13f4 started normally
0000048c.0000159c::2016/02/22-05:52:31.325 INFO  [RCM] About to initialize RPC handle
0000048c.0000159c::2016/02/22-05:52:31.325 INFO  [RCM] Initialized RPC handle to value HDL( ed119beda0 )
0000048c.00000648::2016/02/22-05:52:31.325 ERR   [RCM] rcm::RcmMonitor::RhsRpcResourceControl Error 1722 communicating with RHS process 5108/0x13f4.
0000048c.00000648::2016/02/22-05:52:31.325 INFO  [RCM] Ignoring RPC error from monitor process 5108/0x13f4 since it has just been restarted.
0000048c.0000159c::2016/02/22-05:52:31.325 INFO  rcm::RcmMonitor::RestartResources[RCM] rcm::RcmMonitor::RestartResources: Monitor restart for resource DATA01
0000048c.0000159c::2016/02/22-05:52:31.388 INFO  [RCM] rcm::RcmResource::ReattachToMonitorProcess: (DATA01, Online)
0000048c.0000159c::2016/02/22-05:52:31.388 WARN  [RCM] Canceling pending control STORAGE_GET_DISK_INFO for resource 'DATA01' due to monitor crash.
0000048c.00000648::2016/02/22-05:52:31.388 ERR   [RCM] rcm::RcmResource::Control: (1722)' because of 'result'
0000048c.00000648::2016/02/22-05:52:31.388 ERR   [RCM] rcm::RcmResControl<class rcm::RcmResource>::DoResControlAsync: (1722)' because of 'ResourceControl( STORAGE_GET_DISK_INFO ) failed for resource 'DATA01'.'


000013f4.00001458::2016/02/22-05:58:33.113 ERR   [RHS] RhsCall::DeadlockMonitor: Call ISALIVE timed out by 15 milliseconds for resource 'DATA01'.
000013f4.00001458::2016/02/22-05:58:33.113 INFO  [RHS] Enabling RHS termination watchdog with timeout 1200000 and recovery action 3 from source 5.
000013f4.00001458::2016/02/22-05:58:33.113 ERR   [RHS] Resource DATA01 handling deadlock. Cleaning current operation and terminating RHS process.
0000048c.000013b0::2016/02/22-05:58:33.113 WARN  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'DATA01', gen(0) result 4/0.
000013f4.00001458::2016/02/22-05:58:33.113 ERR   [RHS] About to send WER report.
0000048c.000013b0::2016/02/22-05:58:33.113 INFO  [RCM] rcm::RcmResource::HandleMonitorReply: Resource 'DATA01' consecutive failure count 2.


Viewing all articles
Browse latest Browse all 5654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>