Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

Server 2012R2 Cluster will not start after failure

$
0
0

Hello,

we have a two node cluster running server 2012R2, at the weekend we had a power failure and one of the nodes (I presume it was the active node) has a hardware failure and won't come back up. I thought this will be fine because we have another node and that would take on the work - however, in failover cluster manager the it won't connect to the cluster name, or the local node which is running. if I run the command Start-ClusterNode -FixQuorum it shows that the node is in the state of "joining" but never seems to get any further than that. I can manually start the cluster service but after a while it seems to stop.

I believe this is all because the active node is not contactable now I think what I am asking is how to make this remaining node be the authoritative active node....but I don't know what to do to make that happen.....

I would appreciate any help,

thank you

Steve


Network Card Reset Triggers a Failover?

$
0
0

Hello,

We have a two-node active/active SQL Server 2008 Std Cluster on Windows 2008 on Production.

After a network maintenance, the network card (not heart-beat) of one node has no gateway ip. We want to fix it without causing a failover.

I assume when we set gateway ip, the network card will be reset, and the ip will be temporarily unavailable, then fail-over happens. Am I right?

Is there a way to play with dependencies not to initiate failover during fixing network card?

Thanks,

Upgrading from Server 2008 R2 Core to 2012 R2

$
0
0

I am having a few problems with my 2008 R2 Core installation and am considering upgrading instead of just reinstalling which would be easier. My question is this; I have a three node failover cluster with Cluster shared volumes where the VHD and VM configs are stored on a NetApp SAN. I want to do a staged upgrade where I take two nodes out of the cluster and install fresh 2012 R2 full installs, configure the servers and set up the new cluster, once I have done that I can have a couple of hours downtime while I move the SAN over to the new cluster and then import all the machines into the new cluster.

The scenario works in my head but I was wondering if I am missing something. Also I am no iscsi guru so can someone give me a step by step guide to setup the MPIO and iscsi Section of the cluster.

We have a limited window to do this work as this will have to take place over the Christmas break which this year consists of 7 working days.

  • Can I use the same cluster name or is it best to use a new one?
  • Lastly should I remove the nodes properly leaving one node or just shutdown and let the cluster windge for a couple of days?
  •  Would I have to do any other config changes to the NetApp?

server 2012 cluster repair option

$
0
0

Hello

Can anyone tell me what a repair actually does to an offline cluster? I cannot find it in technet or msdn.  

Thanks

DNS name for sql clustering instance name

$
0
0

Hi all,
sql 2005 or sql 2008 clustering on windows 2008 R2
We create sql 2005 or sql 2008 clustering.  The sql clustering
instance name (DNS name) was created manaully or created automatically in DNS.
 
is it issue if sql instance name was created manaully in DNS?

Thank you.

Error: The computer is joined to cluster when creating the Cluster

$
0
0

Hello Guys,

I have created a cluster to configure Hyper-V for 2 Nods, everything was greate and works perfectly, next day the storage hang and the cluster didn't work any more, I have destroyed the cluster the removed the cluster feature from both nods, deleted the cluster-computer from AD and the deleted the storage. after we fixed the storage, I have reconnect the storage, installed the cluster service on both nods, then I have validate the configuration and I had everything green 100%.

while creating the cluster, I faced an issue Unable "to successfully cleanup" I kept trying and removed the anti-virus, restarted the servers manytime, then I ended up to have another error, direclty when I add the server name on the creat cluster wizard, its telling me that the computer I'm adding is joined to cluster.

I think I need to do some cleaning to the previous cluster, can I have some help here ?

Regards..

Nour


Nour

Windows 2012 R2 Failover Cluster Hyper-V Invalid Class error when I'm trying to create VM

$
0
0

Hi, in my test lab environment I created Windows 2012 R2 Failover Cluster with 4 Servers to get Hyper-V HA.


I had no issues with Windows 2008 R2 or Windows 2012 before in same setup (2 NIC, FC HBA, SAN storage), but this time I cannot creat VM using Failover Cluster console:


Roles - Vurtual Machines - New Virtual Machine - Select Host -> <ANY HOST>

I'm using default settings during VM creation (except setting path to VM manually to pont to desired disk).

IN progress I see disk creation and both VM configuration and .vhdx files on target disk, but after that I see "The Operation has failed. An error occured creating a New Virtual Machine. Invalid Class.



In fact I see virtual machine in Hyper-V Manager and it's fully functional, but not added to Failover Cluster Roles. When I use Configure Role - Virtual Machine to see eligible machines - it's not there.

Cluster Validation says that everything is OK.

I wasn't able to find anything in Eventllog(s) or %SYSTEMROOT%\Cluster as it was in Win2k8.

How should I troubleshoot this issue ?

Cluster physical disk resource 'SQLVS01 Logs' cannot be brought online

$
0
0

Hello,

I was having SQL failover cluster farm contains 4 Servers,  this week I expanded the farm to be 7 Servers.

the farm contains 3 Instance (VS01,VS02 and VS03), after adding the new servers i can move the instances (VS02 and VS03) to any of the new servers, but when trying to move Instance VS01 (the most important Instance) to any of thew servers I face the below error. I can only move this Instance within the Old servers (1.2.3.and 4) but i cannot move it to (5,6 and 7)

while as I said i can move the other instance to any of the servers even the new servers..

I already check some of the topics in the forum regarding the same issue but it could not help me

any help please

Error Message:

Cluster physical disk resource 'SQLVS01 Logs' cannot be brought online because the associated disk could not be found. The expected signature of the disk was 'B13B9D5A'. If the disk was replaced or restored, in the Failover Cluster Manager snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource.


Linux NFS share to Windows 2008 R2 cluster as a resource

$
0
0

Hello,

I would like to share a directory on RHEL 5 Linux server with Windows 2008 R2 server cluster having 2 nodes via NFS read only access to make it as a cluster resource to be accessible by cluster users.

Tried sharing in /etc/exports file as following, got permission denied at Windows server node when tried to open the folder after connecting to it.

/etc/exports file look like following:

/user/test_share windows_server.com(async)

Kindly let me know the best practice to accomplish this. 

Thanks in advance.



Nodes randomly losing communication with cluster

$
0
0

We have a 6 node production cluster.  We are on Windows Server 2008 R2 and SQL Server 2008 R2.  At any time, a node will loss communication with the cluster causing every instance on that node to failover to other nodes.  The event logs are very generic - event ids 1006 and 1335.  We disabled tcp offloading, done nic driver updates, installed various patches (KB2524478, 2552040, 2685891, 2687741, 2754804), but its still happening.  If anyone has any information that can help, please let me know.  Here is what is happening in the cluster log at the time of the disconnect.

00000950.00000b14::2013/02/20-12:37:09.511 WARN  [CHANNEL ~] failure, status WSAETIMEDOUT(10060)

00000950.00000ae4::2013/02/20-12:37:09.511 WARN  [CHANNEL ~] failure, status WSAECONNRESET(10054)

00000950.000009cc::2013/02/20-12:37:09.518 INFO  [ACCEPT] :::~3343~: Accepted inbound connection from remote endpoint:~51451~.00000950.0000133c::2013/02/20-12:37:09.518 INFO  [SV] Route local (~) to remote  (:~51451~) exists. Forwarding to alternate path.00000950.0000133c::2013/02/20-12:37:09.518 INFO  [SV] Securing route from (~) to remote  (:~51451~).

00000950.0000133c::2013/02/20-12:37:09.518 INFO  [SV] Got a new incoming stream from:~51451~

00000950.00000b14::2013/02/20-12:37:09.519 INFO  [PULLER evproddb13] Parent stream has been closed.

00000950.00000b14::2013/02/20-12:37:09.519 ERR   [NODE] Node 4: Connection to Node 7 is broken. Reason Closed(1236)' because of 'channel to remote endpoint 3343~ has failed with status WSAETIMEDOUT(10060)'

00000950.00000b14::2013/02/20-12:37:09.519 WARN  [NODE] Node 4: Initiating reconnect with n7.

00000950.00000b14::2013/02/20-12:37:09.519 INFO  [MQ-evproddb13] Pausing

00000950.00001988::2013/02/20-12:37:09.519 INFO  [Reconnector-evproddb13] Reconnector from epoch 1 to epoch 2 waited 00.000 so far.00000950.00001988::2013/02/20-12:37:09.519 INFO  [CONNECT]:~3343~ from local ~: Established connection to remote endpoint:~3343~.00000950.00001988::2013/02/20-12:37:09.519 INFO  [Reconnector-evproddb13] Successfully established a new connection.00000950.00001988::2013/02/20-12:37:09.520 INFO  [SV] Route local (:~52834~) to remote evproddb13 (~) exists. Forwarding to alternate path.00000950.00001988::2013/02/20-12:37:09.520 INFO  [SV] Securing route from (:~52834~) to remote evproddb13 (3343~).

00000950.00001988::2013/02/20-12:37:09.520 INFO  [SV] Got a new outgoing stream to evproddb13 at 3343~

00000950.00000ae4::2013/02/20-12:37:09.525 ERR   [NODE] Node 4: channel (write) to node 7 is broken. Reason Closed(1236)' because of 'channel to remote endpoint:~3343~ has failed with status WSAECONNRESET(10054)'

00000950.00000ae4::2013/02/20-12:37:09.525 WARN  [NODE] Node 4: Initiating reconnect with n7.

00000950.00000ae4::2013/02/20-12:37:09.525 INFO  [MQ-evproddb13] Pausing

00000950.00000b14::2013/02/20-12:37:09.525 INFO  [NODE] Node 4: Cancelling reconnector...

00000950.00002318::2013/02/20-12:37:09.525 INFO  [Reconnector-evproddb13] Reconnector from epoch 1 to epoch 2 waited 00.000 so far.00000950.00000b14::2013/02/20-12:37:09.525 INFO  [CONNECT] 3343~ from local 14:~0~: Established connection to remote endpoint 3343~.

00000950.00000b14::2013/02/20-12:37:09.525 INFO  [Reconnector-evproddb13] Successfully established a new connection.00000950.00000b14::2013/02/20-12:37:09.525 INFO  [SV] Route local (:~52836~) to remote evproddb13 (:~3343~) exists. Forwarding to alternate path.00000950.00000b14::2013/02/20-12:37:09.526 INFO  [SV] Securing route from (:~52836~) to remote evproddb13 (:~3343~).00000950.00000b14::2013/02/20-12:37:09.526 INFO  [SV] Got a new outgoing stream to evproddb13 at:~3343~



Huge single storage pool for lots of VMs, or seperate storage pools for each VM

$
0
0

Background

Two identical 2008 R2 servers for running VMs with Failover Cluster (Hyper-V)
One IOmega NAS (old) with 15 VMs sharing one large (2 TB) clustered storage pool
One VNXe NAS (New) Nothing setup yet

I am getting ready to start exporting all of my VMs from our old SAN to our new SAN.

Is it better to create one huge clustered storage pool that will be shared between all the VMs, or would it be better to create separate smaller clustered storage spaces for each VM?

Thanks,
Brian

Difference between Resilience and Reduandancy

$
0
0

Hi,

Pls explain what is Difference between Resilience and Reduandancy in Windows with Example.

Thanks

win server 2012 two node cluster, local "cliuser" issue

$
0
0

Hello,

I have a two node Windows Server 2012 STN Cluster with a few SQL instances installed inside it.  Recently in my security event log I see these errors on both nodes:

An attempt was made to reset an account's password.

Subject:
Security ID: SYSTEM
Account Name:<>$
Account Domain:<>
Logon ID: 0x3E7

Target Account:
Security ID: lcoalmachinename\CLIUSR
Account Name: CLIUSR
Account Domain:localmachine name

==

When I look at the local account on both nodes, I see that password is set to never expire, and not be able to be reset.  I am quite confused then, how the above could happen.  Any advice or ideas would be greatly appreciated.

Thank you

Virtual Machine Network Health not working

$
0
0

Server 2012 R2 is supposed to have a feature which detects when the public LAN connection used to get into the VM's becomes unavailable, this is supposed to kick in around 60 seconds after it being unavailable. however, I have tried to test this by physically unplugging the network cable, and disabling the network adapter in ncpa.cpl and the virtual machine doesn't seem to live migrate to the other node. the settings for the network adapter in the virtual machine configuration has the "protected network" box checked by default. is there something else we need to check/configure here?

it looks like it tries to migrate but fails, the message in the information details state:

Live migration of 'Virtual Machine Test-TERM' failed.

'Virtual Machine Test-TERM' failed to fixup network settings. Verify VM settings and update them as necessary

thanks

Steve

Windows 2003 Clustrer- Resources in Evict node

$
0
0

We had a Windows 2003 cluster environment where we have evicted one node (1b) now.when user tires to take a RDP connection to the active node (1a) it says Socket error.The active node (1a) was rebooted.The issue is when the user connects to the evicted node

(1b) he is able to view  Q drive, Z drive which is actually residing on the active node (1a).Could someone please let me know why is this happening ?



FailoverCount is not getting reset for QuorumResource in Windows2012 R2 failover clusters

$
0
0

Hi,

I have two-node failover cluster on windows server 2012 R2 with third party resource as quorum with typeNode and Disk Majority. on fault of quorum resource FOC is not failing over "Cluster Group" to other cluster node. Following log lines are seen in cluster log.

Here is cluster log from fail node.

00008bc.000014d8::2013/12/06-11:45:49.591
INFO  [RCM] rcm::RcmGroup::Failover:(ClusterGroup)
 
000008bc.000014d8::2013/12/06-11:45:49.592
WARN  [RCM]Not failing over groupClusterGroup, failoverCount 2,
failoverThresholdSetting 4294967295, lastFailover 2013/12/06-03:39:54.190
 
000008bc.000014d8::2013/12/06-11:45:49.592
INFO  [RCM]Willretry online fromlong delay restart of quoDG in3600000
milliseconds.

 Quorum resource failover policy’s Maximum failover count is set to one.

000008bc.000014d8::2013/12/06-11:45:49.591
INFO  [RCM] resource quoDG: failure count:1, restartAction:2
persistentState:1.

Is there a way to reset this FailoverCount ? When does FOC increments and resets this failovercount for a resource ?

Thanks in advance

Rakesh


Rakesh Agrawal

volume added to SQL cluster but coludnot be found

$
0
0

Dear

i add volume to SQL server cluster i found it in cluster storage and move to to SQL server cluster service but i cont find it in volume when try to make backup 

Hyper-V Failover Cluster - Inconsistent Network Availability

$
0
0

We've got a Small cluster with, 7 hosts and a dozen or two VM's.  For some reason i'm getting inconsistent availability with the Cluster networks.  The host seem to function fine on there own but theres all types of issues using Migration which i'm assuming is because certain hosts think other hosts are unavailable. For Example:

Cluster Network 1 - From Host 8

Cluster Network 1 - From Host 10

As far as I can tell all of the networks are UP. I can ping all hosts on all interfaces.  What criteria goes into determining host availability?





How to test node failover in Windows 2008 R2 Failover Cluster?

$
0
0
Can anyone give me advice on how to properly test a node failure with a 2 mode Failover Cluster in Windows 2008 R2?

Failover Cluster Network Name Failed and Can't be Repaired

$
0
0

I have an issue that seem to be a different problem than any others have encountered.

I've scoured everything I can find and nothing has fixed my problem.

The problem starts with the common problem of the cluster network name failing on my 2 node server 2012 file server cluster.  The computer object was still in AD and appeared to be fine so it was not the common problem of the object getting deleted somehow.  At the time, there was no other object with that name in the recycling bin, so I don't think it was mistakenly deleted and quickly recreated to cover any tracks, so to speak.

Following one guide, I tried to find the registry key that corresponded with the GUID of the object, but neither node in the cluster had it in its registry (which may be part of the problem).

Since it was in the failed state, I tried to do the repair on the object to no avail.

We run a "locked down" DC environment so all computer objects have to be pre-provisioned.  They were all pre-provisioned successfully and successfully assigned during cluster creation.  The cluster was running with no issues for a month or so before this problem came up.

When I do a repair on the object while taking diagnostic logs the following 4609 error appears:

The action 'Repair' did not complete. - System.ApplicationException: An error occurred resetting the password for 'Cluster Name'. ---> System.ComponentModel.Win32Exception: Unknown error (0x80005000)

There appears to be a corresponding 4771 error with a failure code 0x18 that comes from the security log of the DC that states there was a Kerberos pre-authentication failure for the cluster network name object (Domain\Clustername$)

I believe this is what is causing the repair failure.  All the information I found related to security error 4771 was either a bad credentials given for a user account or the fix was to reconnect the computer to the domain.  I can't seem to find a way to do this with the cluster network name.  If there's a way please let me know.

I've tried a number of things, like resetting the object, disabling it, deleting and creating a new object with the same name, deleting that new object and recovering the original, etc...

Can anyone shed some light on what is going on and hopefully how to fix it other than rebuilding the cluster?  I'm quite close to just tearing it down and building it back up but am hesitant because this cluster in currently in production...

Any help would be appreciated

Viewing all 5654 articles
Browse latest View live




Latest Images