Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

Windows Server 2016 - Failover Cluster failed

$
0
0

Hi, 

I have two Windows server 2016 VMs. Installed the failover cluster feature on both servers. Both servers were fully patched and could ping each other. However when I went to create a cluster on node A, it failed with an error:

https://imgur.com/a/M2KXipm

As soon as this errors occurs, this instantly corrupts network configuration on node B. I can ping node B to A, but can't ping node A to B. Something has gone horribly wrong. The issue I have is that these two VMs and the DC are hosted in Azure. The DC doesn't have DHCP installed, however during the create cluster wizard, it didn't give me the opportunity to assign a static IP to the cluster, instead it states that it will obtain one via DHCP (which doesn't exist). I'm sure this is the root of the problem:

https://imgur.com/a/z2Vc8BI

The only thing I didn't do on the nodes was to enable WMI on the windows firewall, should I blow them away and start over, but with windows firewall disabled as a test, or can this situation be recovered?

Thanks,




FCM sluggish, some VMs changing state rapidly

$
0
0

I have a Hyper-V Failover Cluster where Failover Cluster Manager (FCM) is behaving very strangely.

The FCM GUI shows several VMs changing state rapidly (from Running -> paused -> resume -> running in a very rapid cycle).  When I say rapidly, its happening so fast the right-click menus are flickering.  FCM responsiveness is also sluggish.

These same VMs are shown as running fine and not changing state  in Hyperv Manager.

Does anyone have any suggestions as to what to look for? I'm assuming it's some communication issue.

Error applying Replication Configuration Windows Server 2019 Hyper-V Replica Broker

$
0
0

Hello,

Recently we started replacing our Windows Server 2016 Hyper-V Clusters for Server 2019. On each cluster we have a Hyper-V Replica broker that allows replication from any authenticated server and stores the Replica Files to a default location of one of the Cluster Shared Volumes.

With WS2019 we run into the issue where we get an error applying the Replication Configuration settings. The error is as follows:
Error applying Replication Configuration changes. Unable to open specified location for replication storage. Failed to add authorization entry. Unable to open specified location to store Replica files 'C:\ClusterStorage\volume1\'. Error: 0x80070057 (One or more arguments are invalid).

When we target the default location to a CSV where the owner node is the same as the owner node for the Broker role we don't get this error. However I don't expect this to work in production (moving roles to other nodes).

Did anyone ran into the same issue, and what might be a solution for this? Did anything changed between WS2016 & WS2019 what might cause this?

Kind regards,

Malcolm

Problem to create a windows failover cluster in Windows Server 2016

$
0
0
Hi everyone, I have two servers joined to a domain and I need to install a sql server database failover cluster. Both servers have already installed the figure of failover clustering and the network cards are grouped in NIC Teaming with a Teaming Mode "Switch Independent" and Load Balancing "Dynamic". The problem is that when I want to create the cluster using the wizard, I add the first node without problems, however, when adding the second node I have the following error: "The node cannot be contacted. Ensure that node is powered on and is connected to the network ". Am I missing something? Please I need help about it.

Could two server create a cluster for hyperV failover

$
0
0

Hello,

I have already deploy two windows 2016 standard server for hyperV and File Sharing purpose. Need I deploy the third server or install failover clustering service on DC.

Another question now I use ISCSI to map san storage. I would like to know I should deploy hyperV virtual fibre channel SANs ?

I find it is little strange for me .

Error validating cluster computer resource name (Server 2016 Datacenter Cluster)

$
0
0

    An error occurred while executing the test.
    The operation has failed. An error occurred while checking the Active Directory organizational unit for the cluster name resource.

    The parameter is incorrect

    Interesting enough the cluster name was created successfully in the Computers OU and the cluster can be taken offline and brought back online with no problem. The DNS entry is correct and the cluster name pings to the correct IP.  Changing the name of the cluster will update the cluster computer name in AD with no errors.


Hyper-V 2016 Cluster Crashing

$
0
0

Hello 

We have Three Nodes Windows 2016 cluster on Dell PowerEdge R640 Server. with Dell SC3020 Storage and CSVs are connected using iSCSI.

1. We have Random Issue of sometimes VMs not Responding in the Network and 
When we try to that VM OFF/Shutdown and then it becomes Stopping-Critical state and we have to Reboot that Node to start the VM again.

2. Also When try to Live Migrate using Drain the roles it also failed and stuck at 80% to 84% and we have to reboot that Host.


Things which we tried :

A. I have tried to Upgrade all Dell Driver\Firmware\BIOS and all are latest now.

B. Cluster Nodes having 50% RAM available.

C: Executed Cluster Validation Wizard and all are green.

When I checked Event Logs

Event ID 1230 : A component on the server did not respond in a timely fashion. This caused the cluster resource 'Virtual Machine resource type 'Virtual Machine', DLL 'vmclusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.

Event ID : 5157 : Cluster Shared Volume 'Volume1' ('_CSV_01') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished. This error is usually caused by an infrastructure failure. For example, losing connectivity to storage or the node owning the Cluster Shared Volume being removed from active cluster membership.

Event ID : 5120 :

Cluster Shared Volume 'Volume1' ('_CSV_01') has entered a paused state because of 'STATUS_VOLUME_DISMOUNTED(c000026e)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Event ID:1069 :

Cluster resource 'Virtual Machine Configuration PCVM01' of type 'Virtual Machine Configuration' in clustered role 'PCVM01' failed. The error code was '0x2' ('The system cannot find the file specified.').

Any Help would be more appreciated

Thanks

Prakash



Thanks , Prakash ,Please Note: My Posts are provided “AS IS” without warranty of any kind, either expressed or implied.

Windows 2016 file server stretch cluster - error 0x80071398 when moving to different node

$
0
0

Hi,

We want to deploy highly available file server with automatic failover between two sites and we followed the instructions from here:

https://docs.microsoft.com/en-us/windows-server/storage/storage-replica/stretch-cluster-replication-using-shared-storage

The only difference is that we have only one server in each site.

There were no issues during the configuration, but when we try to failback or move the node to the other site we get the following error:

"Error Code: 0x80071398 The operation failed because either the specified cluster node is not the owner of the group, or the node is not a possible owner of the group"

We have checked the possible owners and it all looks good. We even ran the following commands to make sure that the permissions are set correctly:

Get-ClusterResource | Set-ClusterOwnerNode Server1,Server2

Get-ClusterGroup | Set-ClusterOwnerNode Server1,Server2

We also tried evicting the second node and adding again to the cluster but still no luck.

Any help would be greatly appreciated.



On 2-Node Windows 2012 R2 Cluster w/ raid disks for shared storage, after adding a resource to a new File Server Role, Add File Share to Role results in an exception.

$
0
0

BLUF: Receiving the following exception when Add File Share is performed on an established 2-node cluster with shared storage.

Log Name: Microsoft-Windows-FileServices-ServerManager-EventProvider/Operational
EventID: 0; Version: 0; Level: 2;Task: 1; Opcode: 0, Keywords: 0x2000000000000001
Exception: Caught exception Microsoft.Management.Infrastructure.CimException: The xsi:type attribute (p1:MSCluster_Property_Group_PrivateProperties) does not identify an existing class.

   at Microsoft.Management.Infrastructure.Internal.Operations.CimSyncEnumeratorBase`1.MoveNext()
   at Microsoft.FileServer.Management.Plugin.Services.FSCimSession.EnumerateAssociatedInstances(String cimNamespace, ICimInstance sourceInstance, String associationClassName, String targetClassName)
   at Microsoft.FileServer.Management.Plugin.Services.ClusterAssociationService.GetResourceGroup(ICimSession session, ICimInstance resource)

First, I want to state that the configuration has been successfully fielded several times quite recently. Second, the domain in which the configuration is deployed is strictly controlled (Latest patches, strict GPO's, high UAC enabled, etc.).  With that said, I am trying to figure out why this particular installation is receiving an exception when adding file shares to the configured shared drives.
High level steps are:

On both nodes Install Dell PowerVault MD Storage Manager Software
Use Disk Manager to online, rename and organize three disk volumes
Create the Cluster with two servers as member nodes
Establish Raid Disks in the Cluster (select disks, set names and Quorum configuration)
Create the File Server Role (establish the Client Access Point and select RAID drives for storage)
At this point Add File Shares work...
Add resource (Oracle Fail Safe adds standalone DB to the group) which is successful.
At this point Add File Shares failes, after selecting SMB Share - Quick and Next, the Share Location page has the Server but no longer shows any Volume's from the RAID and Type a custom path: is selected but always fails.

Explanations of the exception and what might require fixing appreciated.

Cluster Manager

$
0
0

hello,

I would like to know what security level is required to run Failover Cluster manager?  I had the question asked of me and I thought you have to be at least a Domain Admin to run it but I want to make sure.

Thanks in advance.

John

HV replication to Azure - VM guest with shared vhds Set

$
0
0

What a mess it is. MS own technology (shared disks in 2016/2019 vhds set) is simply not supported

That is the only bit that stops me from using Azure for DRAnd I do not fancy breaking my clusters (file & SQL)

Unless anybody has a better idea (than just a single VM)

Seb


Technical Q's on S2D, Storage Replica and Hyper-V

$
0
0

So after reading a ton of Docs and watching a few videos, I'm still very unsure whether we should move to HyperConverged platform.

Currently we have a Hyper-V cluster that spans 2 sites, and it all sits on the old HP LeftHand solutions P4500s. Our VMs can swim around from site to site, and if one site fails everything can live happily in the other until the problem is resolved.

Now it's time to renew the infrastructure, and we're taking the opportunity to re-look at how we do things, and of course management is looking to keep costs down.

So obviously Azure Stack HCI is something that has come up.

My concerns are as follows, and I'm hoping someone will be able to fill in the blanks and correct anything I've got wrong...

With S2D the best fault tolerance you can currently have is 2 nodes. So with a HA cluster stretching 2 sites, the maximum number of nodes you can have is 4 nodes, 2 in each site, because as soon as you move to 6+ nodes and one site fails you will be over the 2 node limit and the storage will fail. So in this situation 4 nodes is the limit, which would be very tight for us.

We can add in Storage Replica, and use cluster-to-cluster replication. In this case we can have an S2D cluster in Site A, and replicate to another S2D cluster in Site B. However, there's no automatic fail-over, you have to manually bring everything up, and we can't have that.

You can have Storage Replica with a stretch cluster that does support automatic fail-over, but only with Storage Spaces. Losing all the goodness of S2D.

The best of both worlds would be to have Storage Replica, stretch cluster, and S2D volumes, but this is not supported

Also, there appears to be a big question surrounding log placement for Storage Replica, as it needs to be faster than the data disks. So in effect you can end up limiting the benefit of the cache in S2D, to ensure it's not as fast the volume for the log.

Plus as it starts as a low cost solution, but as soon as you start building it up the costs go up a lot. 

I really like the look of S2D, it looks great. I just can't see how it will work for our environment. We're also looking at the HP Nimbles.

Andrew


Andrew France


Disabling Powershell 2.0 on Win2012R2 failover cluster

$
0
0

Hello,

For security concerns we are disabling powershell 2.0 in several servers.

Does anyone know if failover cluster services depends on powershell 2.0 in order to work? Can it be disabled?

thanks


Cristian L Ruiz

Intermittent problem with the TEAMed NICs

$
0
0

I am having an intermittent problem with the TEAMed NICs losing their network connectivity. 

But first, here’s some details about my environment.  I have a four node Hyper-V Cluster running Windows Server 2012 R2.  The servers are Dell R720 with two Broadcom Quad 1Gig Ports (B5720 and B5719).  This gives each one of my servers a total of eight 1Gb ports.  With this, I’ve setup 2 network Teams.  Here’s how I’ve got my 8 NICs setup:

  • I’ve created a TEAM with Port1 on the B5720 and Port1 on the B5719.  These two TEAMed NICs are plugged into Cisco Switch ports that have been assigned to my “SERVER Network” (the VLAN where all my servers can communicate with each other).  This TEAM is used for regular server to server communications and access to AD.  We will call this team “SERVER_TEAM”
  • I’ve also created a second TEAM with Port2 on the B5720 and Port2 on the B5719.  These two TEAMed NICs are plugged into Cisco switch ports that are configured as trunks, with various VLANs tied to them.  I’ve then created a Virtual Switch in Hyper-V using this TEAM.  We will call this team “vSWITCH_TEAM”
  • The remaining 4 ports have been left as individual NICs.  One for LiveMigration, one for internal cluster communication, and two for SMB traffic.

Both teams have been configured as follows:

  • Teaming mode = Switch Independent
  • Load balancing mode = Dynamic
  • Standby adapter = None (all adapters Active)

Problem Description
Every once in a while, the VMs on one node, will all simultaneously lose their network connection.  And right away, the phone starts to ring off the hook, as our users can no longer access the services supplied by the affected VMs.  LiveMigrating the VMs to another host, will restore the VMs network connections.

I’ve tried moving a non-critical VM back to the problem host, and as expected, that VM lost network connection.  If I reboot the host, then everything is fine again, and I can move VMs back onto that host and they continue to talk to the network perfectly.  I’ve also found out, that instead of rebooting the host, I could unplug the network cables being used by the virtual switch team, and then plug them back in, and that also fixes the problem.  This is a reoccurring issue that has occurred on more than one of my hosts, therefore, it’s not a hardware problem with one of the servers.
When the problem is occurring, our “Cisco Guy” says that he sees a whole lot of ‘dropped packets’ on one of the interfaces in the team.

Does anyone have any ideas or suggestions?  These Hyper-V hosts are bran new, fully patched, with the latest Broadcom NIC drivers installed.

cluster

$
0
0

HI,

We have setup a cluster for SQL AG on 2 of our server 2012 R2 VM but today I see this error on the Node2.

Can some one tell me what this error means?




Shahin


File Share Cluster for UPD

$
0
0

Hello everyone,

So im stuck for days on a problem, I have an RDS farm and the file share for upd (single node).

I want to file share cluster for the high availability on the UPD profiles.

So I started creating the cluster on azure.

Each node has 2 hdd for data for the cluster, I have enable ClusterS2D create the disk on CSVFS_REFS format and everything until now is fine. Then I installed theScale-Out File Server role so the upd will be always available.

Configured a load balancer so can point to the file share role ip, I can connect now with the file share from the RDCB but when I try to add the shared path to  the user profile disk I got this error.

I have set the static ports for RPC on regedit.

#Set RPC dynamic ports to static range setting

 

New-Item "HKLM:\Software\Microsoft\RPC\Internet"

New-ItemProperty "HKLM:\Software\Microsoft\RPC\Internet" -Name "Ports" -Value '50001-51024' -PropertyType MultiString -Force

New-ItemProperty "HKLM:\SOFTWARE\Microsoft\Rpc\Internet" -Name "PortsInternetAvailable" -Value Y -PropertyType "String"

New-ItemProperty "HKLM:\SOFTWARE\Microsoft\Rpc\Internet" -Name "UseInternetPorts" -Value Y -PropertyType "String"

Do I need to configure anything on the load balancer?

Sorry maybe I didn’t expanding it very good as im new to this things.



HyperV Failover Cluster Virtual Disk Replica

$
0
0

Hello all,

I have question about cluster virtual disk replication.  

We have two node s2d cluster.  And now we are building second s2d cluster on second site. On top of both cluster are compute, hyperv-nodes (another cluster that access storage of s2d cluster via SMB share, SOFS).

My queston is. If we enable replication from s2d_cluster1 to s2d_cluster2, what can I expect if s2d_cluster1 fails or we have maintenance...etc..  Should replica instantly take over or by doing this we have just volume backup ??

Thank You,

Pero

Failover Cluster Manager bug on Server 2019 after .NET 4.8 installed - unable to type more than two characters in to the IP fields

$
0
0

We ran into a nasty bug on Windows Server 2019 and I can't find any KB articles on it. It's really easy to replicate. 

1. Install Windows Server 2019 Standard with Desktop Experience from an ISO. 

2. Install Failover Cluster Services.

3. Create new cluster, on the 4th screen, add the current server name. This is what it shows:

cluster services working correctly before .NET 4.8 is installed

4. Install .NET 4.8 from an offline installer. (KB4486153) and reboot.

5. After the reboot, go back to the same screen of the same Create Cluster Wizard and now it looks different:

cluster services broken afte.NET 4.8 is installed - unable to put in a 3-digit IP

Now we are unable to type in a 3 digit IP in any of the octet fields. It accepts a maximum of two characters. 

Has anyone else encountered this? It should be really easy to reproduce. 

Run Cluster Service with an AD Account

$
0
0

Hi all;

I'm trying to run the cluster service with an AD account that i have created, but i need to know what permissions and privileges should this user have to be able to run correctly the cluster's service ???

I'm having a cluster runnig in a 2008 R2 with two nodes, i found in this  Microsoft Page : http://technet.microsoft.com/en-us/library/cc731002%28WS.10%29.aspx    and there is a note that said "In Windows Server 2008, there is no Cluster service account. Instead, the Cluster service automatically runs in a special context that provides the specific permissions and privileges necessary for the service (similar to the local system context, but with reduced privileges). Other accounts are needed, however, as described in this guide."

 

Some people tell me that you can do that, just you need the right perlissions. So can someone make it clear, if someone did that before  ???

Thanks;

 

Rolling Upgrade to Server 2016 - SAS IO Timeouts and CSV Entering Paused State

$
0
0
It seems like others are having similar issues after performing rolling upgrade to server 2016 while using clustered storage spaces.  We are seeing events frequently, always at exactly 30 minute, or 1 hour intervals exactly.  I can't find any other tasks that are running at these times that seem to be causing the issues.  Has anyone else been able to find a solution for this?
Viewing all 5654 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>