Windows Server Failover Cluster Fails

November 3, 2018, 3:49 am

≫ Next: Migrating HyperV 2012 R2 Cluster to a new domain

≪ Previous: Building Hyper-V windows cluster lab

Dears,

We have a Windows Server 2012 Cluster built of two nodes (node 01, node 02).

The cluster is being used for SAP with Sybase Database engine.

We are experiencing an issue that happens from time to time.

Whenever Node 01 hangs, all the system hangs, and Node 02 never restore the service and be active.

In normal situations if Node 01 is totally down, Node 02 takes responsibility, but if it hangs it keeps all the storage partitions reserved for it and never release them until I force restart it.

What do I do?

Regards,

Khalid.

↧

Migrating HyperV 2012 R2 Cluster to a new domain

October 25, 2018, 7:13 am

≫ Next: Event 1070 FailoverClustering error code 1629

≪ Previous: Windows Server Failover Cluster Fails

Hello everyone,

I need to join a 3 nodes HyperV 2012 R2 cluster to a new domain. I have searched doc in the MS knowledge base but I didn't find anything. I would like to understand if there's a way to join the cluster to a new (trusted) AD domain without destroying and recreating the cluster. Apparently, there's no official MS article on that

Regards,

Hal

↧

Event 1070 FailoverClustering error code 1629

October 30, 2018, 10:43 pm

≫ Next: Changing the witness is disabled

≪ Previous: Migrating HyperV 2012 R2 Cluster to a new domain

Hello,

I built a cluster and trying to add a second node. Both servers are on two different subnets, they reached each other properly.

When I try to add the second server as a node, it times out, and in Windows logs, there is an event 1070 which shows "The node failed to join failover cluster 'DAG-Tech' due to error code '1629'."

There is another event 7024, with this description "

The Cluster Service service terminated with the following service-specific error:
Data supplied is of wrong type."

The "Data of wrong type" corresponds to the meaning of the error code 1629.

How to resolve? What data is wrong?

Thanks, Dominic

↧

Changing the witness is disabled

October 27, 2018, 10:49 pm

≫ Next: Cannot move Role across sites - Element not found on Log disk

≪ Previous: Event 1070 FailoverClustering error code 1629

I have a SQL Server Cluster Availability Group, and the cluster is configured to use a file witness.

recently I expand the cluster to my DR site and I get a subscription from Azure to host the witnessImage may be NSFW.
Clik here to view.site

I want to change the cluster witness from a local file witness to use the cloud witness, but the problem is the option is disabled.

How can I change the Witness, and why its disabled.

↧

Cannot move Role across sites - Element not found on Log disk

August 17, 2018, 9:36 am

≫ Next: Server Uptime

≪ Previous: Changing the witness is disabled

Greetings.

I am having an issue moving a WSFC Role from Site1 to Site2 or vice versa.

My configuration:

Windows Server 2016
WSFC, Stretch Cluster
4 nodes Site1, 4 nodes Site2
File Services Roles
Storage Replica
Nimble CS7000 hybrid storage, iSCSI connected

I have two Roles currently created and running successfully. I CAN move the Roles between nodes at the SAME site without issue, however, when I try to move a Role to the opposite site, the disk configured as the LOG for that Storage Replica will briefly show "Element not Found." Error Code: 0x80070490

Has anyone run across this? I'm running a very similar set up in a Dev environment, and the difference is that the iSCSI SAN volumes are backed by a Nimble CS500 instead of the CS7000, not sure if/how that would make a difference, but it seems that it is a possible explanation.

Any thoughts or ideas are welcomed.

↧

Server Uptime

October 29, 2018, 12:30 pm

≫ Next: Getting Event ID 2051 on SQL Cluster / FailoverCluster Logs

≪ Previous: Cannot move Role across sites - Element not found on Log disk

I need to know how long server was shutdown. I manually shutdown for migration and I the server I bring online after a while in a different site. I use the UPTIME.EXE and /S switch to verify the server unavailable time based on the shutdown and boot time.

Image may be NSFW.
Clik here to view.

↧

Getting Event ID 2051 on SQL Cluster / FailoverCluster Logs

November 4, 2018, 3:47 am

≫ Next: Three Node Windows Cluster

≪ Previous: Server Uptime

Hello Everyone,

Getting the below event on SQL Cluster nodes.

ERR 2051 : Microsoft-Windows-FailoverClustering

[RCM] [GIM] ResType Virtual Machine has no resources, not collecting local utilization info

ERR 2050 : Microsoft-Windows-FailoverClustering

[RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...

I referred to the below link here it says safe to ignore or install MSMQ.

https://blogs.msdn.microsoft.com/clustering/2013/04/05/msmq-errors-in-the-cluster-log/

Thank you so much in advance.

Regards,

Mohammed

↧

Three Node Windows Cluster

November 8, 2018, 4:22 pm

≫ Next: Create a new Hyper-V Replica Broker on a 2 node cluster crashes the cluster resource manage and the rle fails to start

≪ Previous: Getting Event ID 2051 on SQL Cluster / FailoverCluster Logs

We having 3 Node Windows 2012 R2 Cluster. With fileshare witness.(assume fileshare iis not located any of these 3 nodes)

What happens if 2 nodes down?

is it necessary to Force Quorum on 3rd Node?

↧

Create a new Hyper-V Replica Broker on a 2 node cluster crashes the cluster resource manage and the rle fails to start

November 9, 2018, 10:45 pm

≫ Next: windows 2016 cluster QuarantineThreshold

≪ Previous: Three Node Windows Cluster

Hello ,

We have a domainless 2 node Windows 2016 cluster . THis was setup for SQL Server availability groups and works fine for that.

I have HyperV installed on both nodes and want to replicate the VM guests from one server to the other. I read that I need to Add the Hyper-V Replica Broker to the cluster before configuring the VM guests for replication.

Everytime I create a new Role the RHS.exe (I think this is the cluster resource manager) crashes and the role fails to start.

The crashes also affects our SQL Server availability groups as well.

I have look at the Cluster logs but there doesn't seem to be an error reason.

Can anyone help (I don't even know what to post to help find out what the problem is)

Both servers are up to date with Windows updates. Its Windows 2016 . No doman.

Any and all helo would be great

Thanks

Greg

↧

windows 2016 cluster QuarantineThreshold

November 2, 2018, 10:24 am

≫ Next: Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

≪ Previous: Create a new Hyper-V Replica Broker on a 2 node cluster crashes the cluster resource manage and the rle fails to start

https://blogs.msdn.microsoft.com/clustering/2015/06/03/virtual-machine-compute-resiliency-in-windows-server-2016/

QuarantineThreshold means Number of failures before a node is Quarantined. I have some questions

1. which kind of error will count as "failures?

2. I see cluster log there something about Quarantined:

"the node experienced '3' consecutive failures within a SHORT amount of time"

what short is defining here ? 1s ? 2s ? 2 mins ? consecutive SAME KIND of failures ?

↧

Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

December 9, 2016, 1:27 am

≫ Next: SMBWitnessClient EventID 8 - Failed to register from Trusted Domain

≪ Previous: windows 2016 cluster QuarantineThreshold

We are trying to deploy 'guest cluster' scenario over HyperV with shared disks set over SOFS. By design .vhds format should fully support backup feature.

All machines (HyperV, guest, SOFS) are installed with Windows Server 2016 Datacenter. Two HyperV virtual machines are configured to use shared disk in .vhds format (located on SOFS cluster formed of two nodes). SOFS cluster has a share configured for applications and HyperV uses \\sofs_server\share_name\disk.vhds path to SOFS remote storage). Guest cluster is configured with 'File server' role and 'Failover clustering' feature to form a guest cluster. There are two disks configured on each of guest cluster nodes: 1 - private system disk in .vhdx format (OS) and 2 - shared .vhds disk on SOFS.

While trying to make a checkpoint for guest machine, I get following error:

Cannot take checkpoint for 'guest-cluster-node0' because one or more sharable VHDX are attached and this is not part of a checkpoint collection.

Production checkpoints are enabled for VM + 'Create standard checkpoint if it's not possible to create a production checkpoint' option is set. All integration services (including backup) are enabled for VM.

When I delete .vhds disk of shared drive from SCSI controller of VM, checkpoints are created normally (for private OS disk).

It is not clear what is 'checkpoint collection' and how to add shared .vhds disk to this collection. Please advise.

Thanks.

↧

SMBWitnessClient EventID 8 - Failed to register from Trusted Domain

November 12, 2018, 1:20 am

≫ Next: Storage Spaces Direct - No disks with supported bus types found to be used for S2D

≪ Previous: Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

Hi there!

I am having errors every 30sec on machines that try to connect to SMB from a failover cluster from a trusted domain.

Event ID 8

Error details: Witness Client failed to register with Witness Server TestSRV02 for notification on NetName \\TestSrv with error (The parameter is incorrect.)

I know that to connect to the trusted domain I need to add the full FQDN but as the server requests the list of Witness Servers from the Failover Cluster, it seems that the list returns without FQDN so my server cannot connect without it.

MCSE: Server Infrastructure

↧

Storage Spaces Direct - No disks with supported bus types found to be used for S2D

November 12, 2018, 4:41 pm

≫ Next: Record Hyper-V guest parent

≪ Previous: SMBWitnessClient EventID 8 - Failed to register from Trusted Domain

Hello,<o:p></o:p>

I am trying to setup a 3 nodes Windows Cluster to take advantage of SQL Always On failover feature.

I have 3 VM's running on VMWare, inside my company’s datacenter (not Azure), with Windows Server 2016 DataCenter installed on each. I can create the cluster with those 3 nodes, they are not joining any Active Directory (DNS only). I want to use Storage Spaces Direct as shared storage, and this is where I am stuck.

On each 3 nodes, I have 4 disks. From the “Get-PhysicalDisk” PS command result, for all disks, MediaType is SSD and BusType is SAS. I have one disk as the boot volume, one disk to store various files, and 2 disks with an unused partition. These 2 last disks are the ones I want to use with S2D, and they are marked as CanPool=True.
When I run the “Get-PhysicalDisk” PS command from the first node, the disks showing up in the list are : the boot disk and the file disk from node 1, and 6 poolable disks (2 disks from each 3 nodes).<o:p></o:p>

From the S2D validation report (launched from the Failover cluster manager), the 6 poolable disks are marked as "eligible for validation=True" with these characteristics :
Disk partition style is MBR. Disk has an Unused Partition. Disk type is BASIC.

while the others disks (boot volume and file disk) are reporting a warning (I am not sure if it is a problem preventing the enability of S2D...) :
Failed to get SCSI page 83h VPD descriptors for physical disk 0.

and have those characteristics :

Disk 1 : Disk is a boot volume. Disk is a system volume. Disk is used for paging files. Disk partition style is MBR. Disk has an Unused Partition. Disk has an IFS Partition. Cannot cluster a disk with an IFS Partition. Disk type is BASIC. The required inquiry data (SCSI page 83h VPD descriptor) was reported as not being supported.

Disk 2 : Disk partition style is MBR. Disk has an Unused Partition. Disk has an IFS Partition. Cannot cluster a disk with an IFS Partition. Disk type is BASIC. The required inquiry data (SCSI page 83h VPD descriptor) was reported as not being supported.

<o:p> </o:p>

When I want to enable S2D from PowerShell command prompt, I receive an error saying : "No disks with supported bus types found to be used for S2D", even if the bus type is SAS.<o:p></o:p>

<o:p> </o:p>

I am not an expert on managing servers, and I may have overlooked something during the setup. If more information on my setup is needed, I can provide them to the best of knowledge. I wanted to put some screenshots but since my account is not verified yet it was impossible, I have put as many details as I could.<o:p></o:p>

Thank you for any advice provided.<o:p></o:p>

↧

Record Hyper-V guest parent

November 14, 2018, 4:18 am

≫ Next: add node to 2016 cluster no longer has validation option

≪ Previous: Storage Spaces Direct - No disks with supported bus types found to be used for S2D

In order to satisfy server licensing on our 4 node Windows Server 2012R2 cluster, I need to keep 90 days worth of logs that show which guest vm is hosted by which host.

Is there any way to accurately record this? I tried GetCluster-log but it doesn't seem to show vm affinity, only CSV affinity.

Thanks in advance,

Matt

↧

add node to 2016 cluster no longer has validation option

November 14, 2018, 5:54 am

≫ Next: Building a Two-Node Failover Cluster

≪ Previous: Record Hyper-V guest parent

using RSAT on a 2016 GUI server to remotely administer a 2016 cluster (still running v8) to add the last 2016 node to it before upgrading to v9.

going through the add node wizard in failover cluster manager no longer seems to have the option of running cluster validation unlike 2012 R2 failover cluster manager - is this by design? i know i can run it after the node is added but would prefer it be done prior to as well.

↧

Building a Two-Node Failover Cluster

November 14, 2018, 11:21 pm

≫ Next: Cluster fails when access to file share witness is lost

≪ Previous: add node to 2016 cluster no longer has validation option

I have issue whe i try to create a Two-Node Failover Cluster :

i got this massges

Node SJEDITB41606.corp.sva.com successfully issued call to Persistent Reservation RESERVE for Test Disk 0 which is currently reserved by node SJEDITB41607.corp.sva.com. This call is expected to fail.

Test Disk 0 does not provide Persistent Reservations support for the mechanisms used by failover clusters. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.

↧

Cluster fails when access to file share witness is lost

May 12, 2015, 3:21 am

≫ Next: How to create Client Access Point using powershell

≪ Previous: Building a Two-Node Failover Cluster

I have a Windows 2008R2 4 node cluster. The cluster is configured like so:

2 x nodes in Primary DC (1 is the active node)

2 x nodes in Secondary DC

1 x file share witness in third site

We had an issue last night whereby the 2 nodes in the secondary DC lost network communication due to a network event. The logs stated:

File share witness resource 'File Share Witness' failed a periodic health check on file share '\\fsw-01\Clus01'. Please ensure that file share '\\fsw-01\Clus01' exists and is accessible by the cluster.

The net effect was that the entire cluster stopped:

Cluster service was halted due to incomplete connectivity with other cluster nodes.

And:

Cluster node 'DC2-SQL1' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

Why has the entire cluster failed due to a networking issue that only affected 2 of the secondary nodes in the secondary site? The primary site nodes could still see the FSW.

Any insight would be great!

Thanks!

↧

How to create Client Access Point using powershell

December 7, 2016, 8:29 am

≫ Next: Storage Spaces Direct / Cluster Virtual Disk goes offline when rebooting a node

≪ Previous: Cluster fails when access to file share witness is lost

Hello Everyone,

Please point me a document or script on creating a client access point with IP Address in WSFC using power shell.

Regards

Sufian

Mohd Sufian www.sqlship.wordpress.com Please mark the post as Answered if it helped.

↧

Storage Spaces Direct / Cluster Virtual Disk goes offline when rebooting a node

March 1, 2018, 4:36 am

≫ Next: creates a replication but this error occurs. Storage Replica - Windows Server 2019 Standard.

≪ Previous: How to create Client Access Point using powershell

Hello

We have several Hyper-converged einvoronments based on HP ProLiant DL360/DL380.
We have 3 Node and 2 Node Clusters, running with Windows 2016 and actual patches, Firmware Updates done, Witness configured.

The following issue occurs with at least one 3 Node and one 2 Node cluster:
When we put one node into maintenance mode (correctly as described in microsoft docs and checked everything is fine) and reboot that node, it can happen, that one of the Cluster Virtual Disks goes offline. It is always the Disk Performance with the SSD only storage in each environment. The issue occurs only sometimes and not always. So sometimes I can reboot the nodes one after the other several times in a row and everything is fine, but sometimes the Disk "Performance" goes offline. I can not bring this disk back online until the rebooted node comes back online. After the node which was down during maintenance is back online the Virtual Disk can be taken online without any issues.

We have created 3 Cluster Virtual Disks & CSV Volumes on these clusters:
1x Volume with only SSD Storage, called Performance
1x Volume with Mixed Storage (SSD, HDD), called Mixed
1x Volume with Capacity Storage (HDD only), called Capacity

Disk Setup for Storage Spaces Direct (per Host):
- P440ar Raid Controller
- 2 x HP 800 GB NVME (803200-B21)
- 2 x HP 1.6 TB 6G SATA SSD (804631-B21)
- 4 x HP 2 TB 12G SAS HDD (765466-B21)
- No spare Disks
- Network Adapter for Storage: HP 10 GBit/s 546FLR-SFP+ (2 storage networks for redundancy)
- 3 Node Cluster Storage Network Switch: HPE FlexFabric 5700 40XG 2QSFP+ (JG896A), 2 Node Cluster directly connected with each other

Cluster Events Log is showing the following errors when the issue occurs:

Error 1069 FailoverClustering
Cluster resource 'Cluster Virtual Disk (Performance)' of type 'Physical Disk' in clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Warning 5120 FailoverClustering
Cluster Shared Volume 'Performance' ('Cluster Virtual Disk (Performance)') has entered a paused state because of 'STATUS_NO_SUCH_DEVICE(c000000e)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Error 5150 FailoverClustering
Cluster physical disk resource 'Cluster Virtual Disk (Performance)' failed. The Cluster Shared Volume was put in failed state with the following error: 'Failed to get the volume number for \\?\GLOBALROOT\Device\Harddisk10\ClusterPartition2\ (error 2)'

Error 1205 FailoverClustering
The Cluster service failed to bring clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Error 1254 FailoverClustering
Clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

Error 5142 FailoverClustering
Cluster Shared Volume 'Performance' ('Cluster Virtual Disk (Performance)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

Any hints / inputs appreciated. Had someone something similar?

Thanks in advance

Philippe

↧

creates a replication but this error occurs. Storage Replica - Windows Server 2019 Standard.

November 16, 2018, 10:18 pm

≫ Next: Network Load balancing setup

≪ Previous: Storage Spaces Direct / Cluster Virtual Disk goes offline when rebooting a node

creates a replication but this error occurs.

New-SRPartnership : Unable to synchronize replication group rgteste2, detailed reason: Cannot update state for replication group rgteste2 in the Storage Replica driver.

At line:1 char:1
+ New-SRPartnership -SourceComputerName SR1 -SourceRGName rgteste1 -Sou ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (MSFT_WvrAdminTasks:root/Microsoft/...T_WvrAdminTasks) [New-SRPartnership], CimException
+ FullyQualifiedErrorId : Windows System Error 1395,New-SRPartnership

Can anyone tell me why this error?

Att. Gabriel Luiz

↧