Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 5654

Question about cluster node majority voting

$
0
0

We've been having problems with a DB instance crashing regularly.  This weekend when it crashed, it seems to have taken the node it was on with it, or this was a separate incident...

Right now I have 3 nodes in the cluster.  2 nodes are running 3 instances (2 on 1). The 3rd node is in a state where the OS is mostly unusable and the Cluster service will not start. 

Event Log:

"The failover cluster database could not be unloaded. If restarting the cluster service does not fix the problem, please restart the machine."

Cluster Log from that machine:

00003768.000067a0::2014/01/06-03:28:05.393 INFO  -----------------------------+ LOG BEGIN +-----------------------------
00003768.000067a0::2014/01/06-03:28:05.393 INFO  [CS] Starting clussvc as a service
00003768.000067a0::2014/01/06-03:28:05.394 INFO  [CS] cluster service logging level is 2
00003768.00004c30::2014/01/06-03:28:05.521 DBG   [NETFTAPI] received NsiInitialNotification
00003768.00004c30::2014/01/06-03:28:05.523 DBG   [NETFTAPI] received NsiInitialNotification
00003768.000031f4::2014/01/06-03:28:05.588 DBG   [NETFTAPI] received NsiAddInstance  for 169.254.3.47
00003768.00004eb4::2014/01/06-03:28:05.590 ERR   [DM] Error while restoring (refreshing) the hive: STATUS_INVALID_PARAMETER(c000000d
00003768.00004eb4::2014/01/06-03:28:05.592 ERR   [DM] mscs::DmAgent::Start: STATUS_INVALID_PARAMETER(c000000d' because of 'Load(NOTHROW(), securityAttributes, discardError )'
00003768.00004eb4::2014/01/06-03:28:05.592 ERR   [DM] Node 3: failed to unload cluster hive, error 87.
00003768.00004eb4::2014/01/06-03:28:05.592 ERR   Hive unload failed (status = 87)
00003768.00004eb4::2014/01/06-03:28:05.592 ERR   FatalError is Calling Exit Process.

This is a 3 node cluster set to node majority, I don't have an available drive letter for a witness disk.  Since the cluster service won't start, I'm not certain how the cluster is still running, but am thankful that it is.

A reboot might fix everything, but I'm very worried that if I reboot the server, and the cluster service still fails to start... it may prevent the entire cluster from starting and we won't be able to run the instances on the other 2 nodes.

Does the 3rd server still act as an odd-number server, even if the cluster service won't start?  If I reboot and the cluster service still fails to start, will the cluster itself be able to be in an UP state and run the DB instances on the other nodes?

I already need to open a MS Support incident on the DB instance crashing, so I'd rather not have to open a 2nd one just to answer this hopefully simple question.

I know if I reboot this server, the cluster will stop because there will only be 2 nodes up in a majority node setup.  My concern is if the cluster service does not start after the reboot, the cluster will remain down. I am guessing because the node voting has already occurred and things are running, it's staying up?  I'd think with the cluster service being down on 1 of 3 nodes, the cluster should be down... but it's not  /phew!!

Thanks in advance!

Mark


Viewing all articles
Browse latest Browse all 5654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>