Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

Windows Admin Center: Missing sddcres.dll

$
0
0

Hello,

I have recently spun up a 3-Node failover cluster with S2D and Hyper-V roles installed, configured, and actively working. Windows Admin Center is pointing me to an article that states it relies on a set of APIs that are not included in Server 2016. However, when I run the command posted on the article, it fails and displays that "C:/windows/cluster/sddcres.dll" doesn't exist. According to the article the libraries are downloaded in 2016 if the 05-2018 KB is installed. I've verified that all 3 nodes are on 07-2019 (just ran CAU to ensure it was installed on all nodes and it was successful). This still didn't fix the command. So, I downloaded the update directly from the Microsoft Update Catalog just in case.. and the installer returns a message that "this update is not applicable".

During the deployment of these nodes I didn't see anything that specifically mentioned 'Hyper-Converged' or a setting I needed to toggle to indicate that. As far as I'm aware the term Hyper-Converged just describes the configuration of the architecture (S2D+Hyper-V on boxes in a Cluster).

Everything in the cluster validation is coming back valid, and I've verified that S2D is functional (NVMe are "Journals" and my HDD/SDD pool is correctly displaying as Capacity & Performance).

Any recommendations?




Showing Un-Monitor and isolated host from the fail over cluster

$
0
0

Hi Expert,

We are encountering same type of issue in my failover cluster environment " your host XYZ is un monitored state or islolated in cluster".

Due to this error my all belonging VMs of particular host were restarted or shutdown. i created 2-3 times support tickets to Microsoft but we did not get any finding or solution from them. I retsrated my host and then it will be ok.

Kindly advise me.

IN my failover cluster, we have 4 hosts and we are using Server 2016.

Thanks in advance.


ejaz

Unable to make a storage pool

$
0
0

Hello all :)

I'm currently in the process of teaching myself about Server 2019 and some of the technologies I've not had chance to play with before.

The one that I am trying at the moment is creating a file server using failover clustering.
I am able to create the cluster (LAB-CLUSTER01) using 3 servers (LAB-S03,LAB-S04 and LAB-S05), running Server 2019 DC Core.

I have created 3 storage pools before creating the cluster (S03-SP, S04-SP and S05-SP). These pools are made of 4 virtual SSDs, creating a single drive.

All of this is running on ESXi 6.5

The storage pools are all running and happy without issue but I am unable to access them from the cluster. The error given below

'Failed to bring the resource 'S03-SP' online.

The device does not recognize the command

Looking at the physical disks tab in cluster manager, they are all marked as 'Becoming Ready'

Once I have tried to add this pool to the cluster, I am then no longer able to access it from within Windows Server Manager.

Would it be possible for someone to advise what I causing this and what can be done (if anything) to fix it.

Many thanks
Tom


Cluster resource 'Virtual Machine VMNAME' of type 'Virtual Machine' in clustered role 'VMNAME' failed.

$
0
0

Hello!

I have Hyper-V Failover Cluster with 3 node. 

NODE1: Windows SRV2016

NODE2: Windows SRV2016

NODE3: Windows SRV2019

There are 30 VMs in failover cluster. I can move VMs with Live Migration to all node except one. The one of the VM can move with Live Migration from NODE2 to NODE 1 and NODE1 to NODE2, but I can't move from NODE1 and NODE2 to NODE3 and I get the following error:

Event id: 1069

Cluster resource 'Virtual Machine VMNAME' of type 'Virtual Machine' in clustered role 'VMNAME' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event id: 1205

The Cluster service failed to bring clustered role 'VMNAME' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Why should be the problem?

Thank You.

How to clustering Windows Server 2016 two different types hardware Dell vs Lenovo server

$
0
0

I have a question about Clustering between two different hardware companies.

I have a Lenvo x3650M 5 5462 server running Windows Server 2016.

Now I have another server, the Dell R740, which also runs Windows Server 2016.

My question is whether to run Windows Server 2016 clustering on Lenovo and Dell OS servers 2016.

Thanks for technical advice


Need help extending a clustered shared volume

$
0
0


Hello everyone,

I am new to Clustered share volumes within server 2012 r2. I am trying to expand or create a new volume.

I have tried to use diskpart to expand the V$ but I keep getting the error that there is not space available to expand.

This volume is on a 12TB SAN. I can see 1.2TB are available.

Does anyone know what I am missing? I don't know why I can see it available on the machine, but not within diskpart.


Windows Fileshare witness is not accessible | After patching

$
0
0

Hi Experts,

Our windows team have applied patches on two nodes of a cluster.Post patch,file share witness is accessible from one server and from other it is not accessible.Hence 

After deep dive,we could see below extra security patches has been applied on server where file share witness is not accessible.

KB3161949
KB3172729
KB3173424
KB3175024
KB4338824
KB4499165
KB4503290



we are not sure what patch is creating this problem as we don't see any official MS doc on this .Please let us know if you have any information on this matter.

Also advise,if there is any forum to check on bug details quickly.

Many thanks in advance ! 

Regards,
Naren poosa

Problem running Update-ClusterFunctionalLevel on Server 2019

$
0
0

Hi

I have in-place upgrade a 2 node SQL cluster (from Server 2016 Std. to Server 2019 Std.). The whole process worked as expected.

Now I want to run Update-ClusterFunctionalLevel, but it is returning the following error:

Update-ClusterFunctionalLevel : You do not have administrative privileges on the cluster. Contact your network administ
rator to request access.
    Access is denied
At line:1 char:1+ Update-ClusterFunctionalLevel+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ CategoryInfo          : AuthenticationError: (:) [Update-ClusterFunctionalLevel], ClusterCmdletException+ FullyQualifiedErrorId : ClusterAccessDenied,Microsoft.FailoverClusters.PowerShell.UpdateClusterFunctionalLevelCo
   mmand


I the Microsoft-Windows-FailoverClustering/Diagnostic eventlog it gives me the following error:

EventID: 2051

Description: [CORE] mscs::ClusterCore::VersionUpgradePhaseTwo: (5)' because of 'Gum handler completed as failed'

I think all permissions are correct, but I can't find the root cause, can you please help me?




Failover Clustering Task Scheduler Survey

netft.sys is the cause for the bugchk blue screen on the server Windows 2008 R2 Datacenter

$
0
0

Hi

we have the server geting rebooted by a bugchk error for netft.sysPlease let me know if we have any fix for this issue. i am not sure wht is causing the issue on the server

the server is windows 2008 R2 Datacenter and it is on the HyperV cluster

Thanks in advance

Some cluster networks with unavailable status

$
0
0
Hello. When we "mounted" the Cluster Failover with Windows Server 2012 R2, and all Networks were "Up", however we realized that we were not able to do Live Migration, and we checked into the Cluster Networks part and saw several interfaces with Status of "Not available". However, when we test access to these interfaces, they are normal and accessible. We have already checked Anti-Virus and Firewall on all Cluster servers (Nodes), and there is no restriction on Anti-Virus and Firewall is disabled.

Print attached.

NOTE: I already did what is on http://blog.mpecsinc.ca/2010/03/nic-binding-order-on-server-core-error.html

NOTE 2: This is only happening on some Interfaces of "Cluster Network 3", "Cluster Network 2" and "Cluster Network 1", all interfaces are "Up"

Guest file server cluster constant crashes

$
0
0

Hi

I have make working a guest file server cluster with Windows Server 2019. the cluster crash constantly, being very slow and finally crashing all my hypervisors servers....

Hypervisor infrastructure:

  • 3 hosts windows server 2019 LTSB datacenter
  • iSCSI Storage 10 Gb with 11 LUNs
  • cluster valid for all tests

Guest file server cluster, 2 VM with the same config:

  • VM 2nd generation with 2019 LTSB Server
  • 4 virtual UC
  • 8GB of non-dynamic RAM
  • 1 SCSI controller
  • primary hard drive: VHDX format, SCSI Controller, ID 0
  • empty DVD drive on SCSI controller, ID 1
  • 10 VHDS disks on SCSI controller, ID 2 to 11, same ID on each node
  • 1 network card on virtual switch routing to 4 physical teamed network cards.
  • Cluster is valid for all tests except the network with one failure point for non redundancy.


after some time, the cluster become very slow, crash and make all my hypervisors crashs. the only errors returned by Hyper-V is some luns became unavalaible due to a timeout with this message:

Le volume partagé de cluster « VSATA-04 » (« VSATA-04 ») est à l’état suspendu en raison de « STATUS_IO_TIMEOUT(c00000b5) ». Toutes les opérations d’E/S seront temporairement mises en file d’attente jusqu’à ce qu’un chemin d’accès au volume soit rétabli.

I have checked every single one parameters on VM and Hyper-V config, search with each hint I was given by logs but nothing and the crashes remains....

and sorry for my poor language, english is not my main ability for speaking

Zero Downtime File Server - Would this setup work?

$
0
0

Hello everybody,

I was given the task to plan a redundant file storage environment that can compensate failure of any component without service interruption. This is a field I have little experience with, which I want to confirm that the concept I am working on actually works. I don't have the resources to build a test system available at the moment either, making this a very theoretical construct.

I want to use a Windows Failover Cluster with a Scale Out File Server role installed. Three physical servers with limited storage space for only the operating system are supposed to be the nodes of this cluster (three as to avoid using a file witness). A single SAN storage solution will provide the storage space for the file server, attached to the individual nodes via fibre channel. The SAN storage itself has all components built in redundantly, eliminating the need to provide a second storage unit and managing the synchronization of both.

The clients are expected to then connect to the file service provided by the cluster which is then (transparently) handled by any of the nodes and, in case of failure of this node (e.g. loss of power), instantly taken over by another without interruption or considerable delay.

In case it is important: The file server is supposed to host files of different applications including resources and configurations. These applications are not run on the server, but on clients. They are executed FROM the server share though, so constant and uninterrupted file provision is required, otherwise the applications will eventually crash. Executing from the server share is mandatory.

Now as I mentioned my experience with this is rather limited, and while the concept is based on what I read from MS documentation I would like to ask you for confirmation of this working or, in case it doesnt, advice on what to do differently.

Additionally, as far as I understand running a domain controller role on the same server that is running a scale out file server role is not possible or at least not recommended. Is this still valid for Server 2019 and if, is there a way to achieve the goal of zero downtime file provisioning on the same device that is running a DC or do it have to be seperate machines?

Thanks in advance!

Windows Server 2016 cluster system Failover Cluster Validation Report shows error on the CNO

$
0
0

Hi All,

I'm having an issue with my Windows Server 2016 cluster system.
it consists of 2 nodes, let say Node1 (showing as down) and Node2 (is up).

Node1 is ping-able to Node2 and vice versa, but not sure why it is showing as down.

The Fail-over Cluster Validation Report shows error only on the below CNO:

  • The cluster network name resource 'PRDSQL-CLUS01' has issues in the Active Directory. The account could have been disabled or deleted. It could also be because of a bad password. This might result in a degradation of functionality dependent on the cluster network name. Offline the cluster network name resource and run the repair action on it. 
    An error occurred while executing the test.
    The operation has failed. An error occurred while checking the state of the Active Directory object associated with the network name resource 'Cluster Name'.

    Access is denied
This is the error logged from the Failover Cluster Manager.

Event ID 1069

Cluster resource 'Cluster Name' of type 'Network Name' in clustered role 'Cluster Group' failed.Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event ID 1688
Cluster network name resource detected that the associated computer object in Active Directory was disabled and failed in its attempt to enable it. This may impact functionality that is dependent on Cluster network name authentication.Network Name: Cluster NameOrganizational Unit: Guidance:Enable the computer object for the network name in Active Directory.

The Virtual Cluster Frontend called PRDSQL-CLUS01is reporting it is disabled in Active Directory, as per the above error.
 
I have tried:

Taking the virtual endpoint offline and running a repair, but the errors state that “File not Found” and Error Displaying Cluster Information
Create a blank role, SQL and CAU are still working, it is only the front end failover cluster virtual network name AD account (CNO) that is having the issue.

Any help would be greatly appreciated.

Thanks,


/* Server Support Specialist */

Cluster IP keep switching

$
0
0

Dear All,

I have cluster node with 2 IPs, one active and the other one is passive. When i do NSLOOKUP i get the 2 IPs, when i ping the cluster name, then its pining the passive IP not the active IP, it should ping the active IP ( passive IP not pining - request time out). I did delete both A records in DNS then it worked fine, but after a while it went back to the passive IP again. What i need is when i ping the cluster note it must ping the active IP.

Thank you 


Cluster shared storage issue

$
0
0

Hi

I have windows servers 2012 R2 cluster with 7 drive shared from SAN storage.

Now I am not able to open all 7 drives from each node as below error.

c:\ClusterStorage\Volume1 is not accessbile

the reference account is currently locked out and may not be logged on to.

Not able to rebuild cluster, issue on disks ?

$
0
0

Hi all,

I have two Windows servers 2012 r2 (DB1A and DB1B) where a failover cluster + SQL Server Avalibility Groups used to work. But something went wrong (don't really know what, maybe an aggressive GPO) and the cluster was totally dead.

When I try to rebuild it, I get this kind of warning :

List Disks To Be Validated
Physical disk ab780ec8 is visible from only one node and will not be tested. Validation requires that the disk be visible from at least two nodes. The disk is reported as visible at node: DB1A
Physical disk ab780ec0 is visible from only one node and will not be tested. Validation requires that the disk be visible from at least two nodes. The disk is reported as visible at node: DB1A
No disks were found on which to perform cluster validation tests. To correct this, review the following possible causes:
* The disks are already clustered and currently Online in the cluster. When testing a working cluster, ensure that the disks that you want to test are Offline in the cluster.
* The disks are unsuitable for clustering. Boot volumes, system volumes, disks used for paging or dump files, etc., are examples of disks unsuitable for clustering.
* Review the "List Disks" test. Ensure that the disks you want to test are unmasked, that is, your masking or zoning does not prevent access to the disks. If the disks seem to be unmasked or zoned correctly but could not be tested, try restarting the servers before running the validation tests again.
* The cluster does not use shared storage. A cluster must use a hardware solution based either on shared storage or on replication between nodes. If your solution is based on replication between nodes, you do not need to rerun Storage tests. Instead, work with the provider of your replication solution to ensure that replicated copies of the cluster configuration database can be maintained across the nodes.
* The disks are Online in the cluster and are in maintenance mode.
No disks were found on which to perform cluster validation tests.

and when I open the Failover Cluster Manager, I can see the two nodes but can't see anything on the Roles folder, nor Disks.

Of course, SQL Server Availibility Groups is not possible :


The local node is not part of quorum and is therefore unable to process this operation. This may be due to one of the following reasons:
•   The local node is not able to communicate with the WSFC cluster.
•   No quorum set across the WSFC cluster.

I'm a bit lost. It would be great if someone could help.

Live Migration and WorkGroup Cluster on windows 2019

$
0
0

Hi ,

I found the following document about live migration and work group cluster on Windows 2016.

https://techcommunity.microsoft.com/t5/Failover-Clustering/Workgroup-and-Multi-domain-clusters-in-Windows-Server-2016/ba-p/372059

I understand Live migration is not support, and support quick migration. Is it same on windows 2019? or any plans about it ?


Drive on all nodes in SQL Availability Group "Formatted" at the same time (Cluster on Windows 2016 standard)

$
0
0

We have a 2 node SQL Availability Group on a Windows 2016 Std Cluster.

SQL Server reported the databases suspect after the data drives on both servers appeared to have been formatted.

On one of the servers we found the following events:

Event ID 7036 on 7/26/2019 at 9:37:55AM

Event ID 98 on 7/26/2019 at 9:38:12AM

Event ID 98 on 7/26/2019 at 9:38:13AM

These appear to indicate that the drive was formatted.

We have tested and found that using the Powershell "Format-Volume" command (Run locally or remotely) against one server causes the same drive on both nodes in the Cluster/AG to be formatted.

One possible cause is a server build script has been run with incorrect server details and we are investigating this possibility.

My questions are:

Has anyone experienced drives being "Formatted" simultaneously across nodes in a Clustered SQL AG?

Is the formatting of drives on an Availability Group supposed to affect all nodes? I've not found documentation to explain this.

How to automate actions based Cluster Validation Test results?

$
0
0

In windows clustering you can run a "Cluster Validation Report" either from the Cluster Administration Console or from PowerShell using Test-Cluster.

However, the output is an .htm file, which isn't really super helpful compared to getting a list of True/False values like you would expect from a "proper" PowerShell cmdlet 😉

So, my question is whether anyone knows of a way to pass the results from Test-Cluster on, so I can build something that can fix the settings that failed?
Or do I really only have a choice between inventing the wheel by creating a bunch of tests myself, or manually reading a report?

I find it hard to really believe that this is something that hasn't been automated yet?

I have been googling fairly hard, but haven't been able to find any tooling around this already.
(I did suggest fixing our build pipeline so we could have a success-rate higher than 15% on new clusters, but apparently that's not popular ¯\_(ツ)_/¯)

ps. currently I'm looking at whether I can parse the htm file that is output, but meh -__-

Viewing all 5654 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>