Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

Can't delete file

$
0
0

Hello

I have a two node Hyper-V cluster.  One of the virtual disks in the Cluster Shared Volume became corrupted somehow and the file server it was attached to wouldn't boot because of it.  I detached the vhdx file from the virtual machine so it would boot, and I have a good copy of the vhdx that I have restored from backup, but I can't drop it into the CSV because the old corrupt vhdx is still in there.  I can't delete, move or rename it.  Any attempt to generates this error:

Error 0x80070570: The file or directory is corrupted and unreadable.

There are several other working vhdx files in that directory, so it must be the file and not the directory that is corrupt.  Can someone help me to delete this file?


Hutch


HangRecoveryAction '6' instead of '3'

$
0
0

When running the validation report for a new Server 2019 Hyper-V cluster I got the following error:

The setting for HangRecoveryAction on this cluster is not the default and recommended setting. This setting controls the action taken if it is detected that the service is not responding. This is configured to have a HangRecoveryAction value of 0x6, the recommended value is 0x3. The following is a list of values and the action that they indicate.

'6' seems to be a new standard value in Server 2019 but I can't find any documentation about what option 6 does. The standard used to be 3, but I'm a bit anxious about changing it back, because this seems to be the new standard.

Does anyone know why this changed and what option 6 means?

Failover Cluster Validation Report with warnings

$
0
0

Hi,

I have 2 Node cluster on windows 2008 R2 with SQL Server 2008 R2 Cluster. i have run cluster validation report and found following warnings pls suggest and guide me health of cluster and how to fix and remove these warnigns. thx.

The "Cluster Group" does not contain a File Share Witness or a Witness disk.
This is a required resource for the group. It may be difficult to manage the cluster with this resource missing.

Adapters Local Area Connection 6 and Local Area Connection 5 on node 1 have IP addresses on the same subnet.
Adapters Local Area Connection and Local Area Connection 4 on node 1 have IP addresses on the same subnet.
Multiple adapters on node 1 have addresses on the same subnet.

Node 1 has an IPv4 address XXX.XXX.XXX.XXX configured as Automatic Private IP Address (APIPA) for adapter Local Area Connection 6.
This adapter will not be added to the Windows Failover Cluster. If the adapter is to be used by Windows Failover Cluster,
the IPv4 properties of the adapter should be changed to allow assignment of a valid IP address that is not in the APIPA range.
APIPA uses the range of 169.254.0.1 thru 169.245.255.254 with subnet mask of 255.255.0.0.

The HostRecordTTL property for network name 'Name: ClusterNAME' is set to 1200 ( 20 minutes).
For multi-site clusters the suggested value is 300 (5 minutes).

The HostRecordTTL property for network name 'Name: ClusterDTC' is set to 1200 ( 20 minutes).
For multi-site clusters the suggested value is 300 (5 minutes).

The HostRecordTTL property for network name 'Name: ClusterSSQL' is set to 1200 ( 20 minutes).  For multi-site clusters the suggested value is 300 (5 minutes).

Node 1 is reachable from Node 2 by only one pair of interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available or consider adding additional networks to the cluster.


Node 2 is reachable from Node 1 by only one pair of interfaces. It is possible that this network path is a single point of failure for communication within the cluster. Please verify that this single path is highly available or consider adding additional networks to the cluster.

Analysis Services

This resource is configured to run in a separate monitor. By default, resources are configured to run in a shared monitor.
This setting can be changed manually to keep it from affecting or being affected by other resources.  It can also be set automatically by the failover cluster. If a resource fails it will be restarted in a  separate monitor to try to reduce the impact on other resources if it fails again.
This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab.  There is a check-box 'run this resource in a separate Resource Monitor'.

This resource is configured to run in a separate monitor. By default,  resources are configured to run in a shared monitor.
This setting can be changed manually to keep it from affecting or being affected by other resources.  It can also be set automatically by the failover cluster.  If a resource fails it will be restarted in a separate monitor to try to reduce the impact on other resources if it fails again.
This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab.  There is a check-box 'run this resource in a separate Resource Monitor'.

The servers do not all have the same software updates. Hotfix Id KB2525694 on Node 1


iffi

"No disks were found on which to perform cluster validation tests" - why not?

$
0
0

I am new to clustering, and I'm trying to create my first cluster. I have two HP DL585 servers, each with a RAID configuration of RAID1 for the C: (system drive) and RAID6 for the Data drive. Both logical drives on each server are NTFS, and the two Data drives (the drives I am trying to cluster) are empty.

When ever I try to run the validation test for Failover Cluster Manager between the two servers, I get an error saying "No disks were found on which to perform cluster validation tests".

Why not? What do I need to do to the logical drives in order to allow them to be seen/accessed/used by Failover Cluster Manager?

Any help gratefully received.

How to set HangRecoveryAction in powershell on server 2019

$
0
0

I have a server 2019 cluster, when I run validation I get a warning The setting for HangRecoveryAction on this cluster is not the default and recommended setting, can someone tell me how I set it.

Thanks

No disks were found on which to perform cluster validation tests

$
0
0

should the storage be set in the cluster if the file share is set to the cluster? it is asked as warning "No disks were found on which to perform cluster validation tests" is highlighted in Yellow at Validate storage persistent reservation in the Failover Cluster Validation Report.

My configuration:

2 HP PCs installed server 2012 R2 

A windows failover cluster created with 2 nodes (HP PC)

a file share witness created in a seperate server (3d PC installed server 2012 R2 too. The witness is connected to the cluster.

such please clarify if the cluster storage (whatever disks or pools) requires to set/configure per the warnings in the clsuter validation report.  

thanks

John

Configure Quarom and pools, roles

$
0
0

Hello Friends,

Have configured cluster with two notes with server 2012 R2. can anyone help me to configure below. We are going to run Hyper-V in this cluster.

Configure Quarom and pools, roles. Please refer screenshots below.

Have done google search, however. unable to find sutable information as per my requirement.

Please help me with the steps.


ITandIT

DSN CACHE CANNOT BE FLUSHED

$
0
0

how to flush DNS cathe on Server 2012 R2? the content still exists even though ipconbfig/flushdns or restart are done on the server. waht would the wrong with it?

thanks

John


multisubnetfailover and clsuter parameters

$
0
0

please clarify if it would still need to change HostRecordTTL value if the MultiSubnetFailover= True is set in the additional Connection Parameter at conenct to server in SSMS 2017.  

SQL 2016 standard installed on SERVER 2012 R2

A Cluster with 2 nodes created in 2 subnets

a warning shows "The HostRecordTTL property for network name 'Name: ClusterNAME' is set to 1200 ( 20 minutes). For multi-site clusters the suggested value is 300 (5 minutes)." in the failiver cluster validition report. 

I wonder if it would do either HostRecordTTL value change or MultiSubnetFailover= True. please advise.

thanks

John

CSV access issues from non owner node

$
0
0

We are having an issue on a brand new 2019 datacenter build.

We have a 3 node cluster connected to a 3par SAN via FC. All LUNS are showing and are present as cluster shared volumes.

When we try and set the default replica or hyper-v file location on the non owner node, we get the following error.

Failed to add authorization entry. Unable to open specified location to store Replica files. 

Error: 0x80070057 (One or more arguments are invalid).

Has anybody seen this problem before? It is the second time that we have seen this in a 2019 environment

Cluster network name resource failed to find the associated computer object in Active Directory.

$
0
0

We have set up a Cluster on Windows Server 2016. Initial validation succeeded, however I moved the computer object generated by the Cluster in active directory from it's default location to AD - Computers OU and now seeing this error: 

"Cluster network name resource failed to find the associated computer object in Active Directory. This may impact functionality that is dependent on Cluster network name authentication.

Network Name: Cluster Name
Organizational Unit: OU=Windows DSC,DC=XXXXXXX,DC=Local"

Guidance:

Restore the computer object for the network name from the Active Directory recycle bin.

(domain blanked for security reasons) 

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          7/06/2019 12:51:57 PM
Event ID:      1685
Task Category: Network Name Resource
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      XXXXXXXXX.XXXXXXX.Local
Description:
Cluster network name resource failed to find the associated computer object in Active Directory. This may impact functionality that is dependent on Cluster network name authentication.

Network Name: Cluster Name
Organizational Unit: OU=Windows DSC,DC=XXXXXXX,DC=Local

Guidance:

Restore the computer object for the network name from the Active Directory recycle bin.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
    <EventID>1685</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>19</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2019-06-07T02:51:57.836490300Z" />
    <EventRecordID>10392</EventRecordID>
    <Correlation ActivityID="{C0BF5C0C-E484-4BDC-A006-D7B5895DE02C}" />
    <Execution ProcessID="4572" ThreadID="7176" />
    <Channel>System</Channel>
    <Computer>XXXXXXX.XXXXXX.Local</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="ResourceName">Cluster Name</Data>
    <Data Name="OrganizationalUnit">OU=Windows DSC,DC=XXXXXX,DC=Local</Data>
  </EventData>
</Event>

It was fine until I moved it to the computers OU. My question is, does it need to be in it's default location to work? 

WS2016 Multi-Subnet Cluster Communication Issues - Port 3343

$
0
0

Hi,

We’ve recently been in the process of extending a number of single site SQL failover clusters (on WS2016) into multi-site geoclusters. Our environment is a mix of physical and virtual nodes, however for the purpose of simplifying my question I will discuss a single site multi-subnet cluster which is experiencing the exact same issues as our multi-site geoclusters.

The single site, multi-subnet cluster is setup as below:

2 x nodes in network “A”

2 x nodes in network “B”

Each node is on identical infrastructure and has a “Data” network (for client and cluster communication) and a dedicated “Heartbeat” network (for cluster communication only). The heartbeat network is routable between the two subnets. Static routes have been added to each host.

When we run a validation test on a multi-site (or multi-subnet cluster) we get an error on the network validation test stating the below:

Node site1-node1 is reachable from Node site2-node1 by multiple communication paths, but each path includes network interface site2-node1 - Heartbeat. This network interface may be a single point of failure for communication within the cluster. Please verify that this network interface is highly available or consider adding additional networks or network interfaces to the cluster.

When delving deeper into the validation report it shows a failure communicating on UDP port 3343 between the two data networks at each site. We’ve ran the report numerous times and never get a network failure between the local nodes, only between different subnet nodes. We also never see a communication issue on the heartbeat network (dedicated to cluster communication).

The data network errors intermittently on the validation report. Sometimes the report will pass without any errors. Other times it will show certain cross subnet nodes can’t communicate and occasionally all nodes can’t communicate across subnets. We seem to be able to recreate the issue by simply restarting all of the cluster nodes or the cluster services on each node. Even more strange is if we reboot the nodes or restart the cluster service once we get validation errors (as above), they’ll clear for a period of time.

We’ve tested disabling the heartbeat network which we’ve created specifically for cluster communication on all nodes and then run the validation test. The tests pass successfully showing that our data > data networks between nodes and subnets/sites can communicate successfully. As soon as we reenable the heartbeat NICs and rerun the validation test it begins erroring again.

We’ve tested with a UDP port emulator, disabled the cluster service (after the validation test has reported that the cluster isn’t communicating over the data network) and then sent UDP packets over port 3343 to confirm that they can reach node that failed the validation test successfully. We’ve also run packet traces and can confirm that both tests show that the respective ports between the hosts are open. Windows Firewall is turned off on all nodes and the two subnets used don’t pass through a firewall. The hosts are also fully patched.

It doesn’t appear to be a networking issue as the packets are reaching the nodes, but for some reason the validation report is intermittently failing.

Any help with this would be greatly appreciated.

R



ReFS for CSV HyperV

$
0
0
I just built a brand new Server 2016(1607) Failover cluster for use as HyperV nodes. I am trying to figure out if i should use ReFS or NTFS. Most of the information seems old and points to this webpage as proof that you should not use ReFS 

https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview

However this page states you can use ReFS for CSV

The following features are available on ReFS and NTFS:

Functionality: Cluster Shared Volume (CSV) support 
ReFS: Yes  
NTFS: Yes

Any guidance here?


*EDIT*

Ok i am coming to the same conclusion everyone else is. Use NTFS, the reason is ReFS runs in FileSystemRedirected. I tested this with my current cluster which matches the information i am reading everywhere.

http://www.itprotoday.com/windows-8/ntfs-or-refs-cluster-shared-volumes-windows-server-2016

I am eager to see this limitation lifted. 




Performance Issue on Storage Space Direct Server 2019 - Getting high read and write Latency

$
0
0

Hello All,

On S2D i am getting performance issue, getting high read and write Latency. From some days getting more issues, not getting constant IOPs, in every second IOPs reach thousands and in next second it came to hundreds, same thing happening with Throughput read and write speed, earlier having performance issue but getting constant IOPs. In admin center it's creating peeks on IOP's and Throughput, due to this hosted VPS are getting hang and slow.

I have configured S2D with 4 nodes having Nvme for caching and SSD for storage as below:

Node 1 : 1x250 Nvme, 3x1TB SSD, Not Having Hyper-v role

Node 2 : 1x500 Nvme, 3x1TB SSD, Not Having Hyper-v role

Node 3 : 2X250 Nvme, 4x1TB SSD, Having Hyper-v role

Node 4 : 2X250 Nvme, 4x1TB SSD, Not Having Hyper-v role

Node 5,6,7 : Not having any SSD or Nvme for storage,  Only having Hyper-V role

All server are connected with 10 GB Ethernet and using CSV to storing the VM files.

Please suggest how to resolve the issue.

Storage Spaces Direct, server specs for SSDs

$
0
0

Hi All,

Looking to build an R&D VDi platform between two nodes using local disks.

I'm planning on buying two servers each with 4 x 1.92tb 6gbps sata SSDs.  My research tells me this:

2 servers meaning 2-way mirror

all ssds so no caching required

auto calculated reserve space

Usable capacity = 6.9tb

fileshare witness hosted away from the cluster

This is the first time I've looked into storage spaces direct as I've always gone with the traditional route of compellent sans.  My servers have an HBA330 card which is needed for this technology (ie, no raid at the hardware level).  I'm confused right from the off regarding installing windows on each server.  Usually I go with 2xssd raid1 for the OS then map my iscsi targets for the storage.  How do I go about setting up the disks so I can get windows installed before then installing the roles to support storage?  Is it simply a case of specing the server with say 2x250gb nvme (raid1) on its own controller card?

I'm going with two network cards.  The first one will give me dual 25gbps for the storage (dedicated fibre switch for storage only), and I'm going with a second card which is dual 40gbps to the LAN.  We have plenty of ports available on our fibre core switch so might as well make use of it all.  Does this sound like a good idea, or should I look into swapping the disks for sas 12gbps ones and upgrading the network card from 25gbps to 40gbps for storage?

The two nodes will also be running hyper-v failover clustering so we can live migrate critical desktop vms (although not all will need to failover)

also, when I add a third (and maybe forth) server I can change to 3-way mirror on the fly?

Thanks!!














netft.sys is the cause for the bugchk blue screen on the server Windows 2008 R2 Datacenter

$
0
0

Hi

we have the server geting rebooted by a bugchk error for netft.sysPlease let me know if we have any fix for this issue. i am not sure wht is causing the issue on the server

the server is windows 2008 R2 Datacenter and it is on the HyperV cluster

Thanks in advance

Failover Cluster Manager - Ghost machine can't delete

$
0
0
Running a Windows Server 2016 Hyper-V Clustered environment. Was running VMM 2016 Clustered w/ SQL 2016 Clustered as well (Currently all shut down). Ran into an issue with a VM where we had to delete the VM. Machine did not cleanly delete, and left a ghost VM role in Failover Cluster Manager. It is unable to remove the VM role for this particular machine though there is nothing still allocated to the machine. The folder containing the VM was deleted, the SID folder for the machine was deleted. Still Failover Cluster Manager shows this VM role with no way to delete it. Has anyone run into this issue and a way to resolve it? It only exists in Failover Cluster Manager, as far as Hyper-V is concerned, the machine is gone... The error code when trying to remove the VM from failover cluster manager is: "Error Code: 0x8007012f The file cannot be opened because it is in the process of being deleted." The issue is that there is nothing for it to delete, nor have any of the forums i've come across this far adequately addressed how to resolve this issue...

no storage disk in the cluster on failover cluster manager

$
0
0

Hello team,

i have two nodes windows cluster running on window server 2008R2 which is running in vsphere environment  and Logical disk are virtual disks(Sql DB, SQL logs)only and not in the LUN wants to know is it normal behavior because i didn't know the disk were added already in the storage disk page as you may see the below screenshot

i have checked that the cluster report and says that no disks were found.

Please let me know the solution. DBA guys says that, issue because of disk is missing in the failover cluster manager under storage but cluster report and cluster events no critical and error messages.



Many thanks in advance

Ansar

Downgrade Cluster level

$
0
0

Hi,

I have to downgrade Windows server 2019 cluster back to 2016

One node I reinstalled to W2016 but can't add to cluster regarding cluster functional level.

Is it possible to create new cluster2, add same storage from existing cluster1

Will work cluster storage in both clusters ?

Partition information lost on cluster shared disk

$
0
0

Hi everyone,


we've got a cluster virtual disk where the partition table and volume name broke. Has anyone experienced a simliar problem and got some hints on how to recover?


The problem occured last friday. I restarted node3 for windows updates. During the restart node1 had a bluescreen and also restarted. The failover cluster manager tried to bring online the cluster resources but failed several times. Finally the resource-swapping came to a rest on node1 which came up early after the crash. Many virtual disks were in an unhealthy state, but the repair process managed to repair all disks so they are now in a healthy state. We aren't able to explain why node1 crashed. Since the storage pool is in dual parity mode the disks should be able to work even if there are only 2 nodes running.

One virtual disk, however, lost its partition information.


Network config:

Hardware: 2x Emulex OneConnect OCe14102-NT, 2x Intel(R) Ethernet Connection X722 for 10GBASE-T

Backbone-Network: On the "right" Emulex network card (only members in this subnet are the 4 nodes)

Client-access teaming network: emulex "left" and intel "left" cards in team; 1 untagged network and 2 tagged networks


Software Specs:

    • Windows Server 2016
    • Cluster with 4 Clusternodes
    • Failover Cluster Manager + File Server Roles running on the cluster
    • 1 Storagepool with 36 HDDs / 12 SSDs (9HDD / 3 SSD on each node
    • Virtual disks are configured to use dual parity:
Get-VirtualDisk Archiv | get-storagetier | fl
  •    FriendlyName           : Archiv_capacity
  •    MediaType              : HDD
       NumberOfColumns        : 4
       NumberOfDataCopies     : 1
       NumberOfGroups         : 1
       ParityLayout           : Non-rotated Parity
       PhysicalDiskRedundancy : 2
       ProvisioningType       : Fixed
       ResiliencySettingName  : Parity

Hardware Specs per Node:

  • 2x Intel Xeon Silver 4110
  • 9HDDs à 4 TB and 3 SSD à 1 TB
  • 32GB RAM on each node

Additional information:

The virtualdisk is currently in Healthy state:

Get-VirtualDisk -FriendlyName Archiv

FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach   Size

------------ --------------------- ----------------- ------------ --------------   ----
Archiv                             OK                Healthy      True           500 GB


The storagepool is also healthy:

PS C:\Windows\system32> Get-StoragePool
FriendlyName   OperationalStatus HealthStatus IsPrimordial IsReadOnly

------------   ----------------- ------------ ------------ ----------
Primordial     OK                Healthy      True         False
Primordial     OK                Healthy      True         False
tn-sof-cluster OK                Healthy      False        False


Since the incident the event log (of current master: Node2) has various errors for this disk like:

[RES] Physical Disk <Cluster Virtual Disk (Archiv)>: VolumeIsNtfs: Failed to get volume information for \\?\GLOBALROOT\Device\Harddisk13\ClusterPartition2\. Error: 1005.


Before the incident we also had errors that might indicate a problem:

[API] ApipGetLocalCallerInfo: Error 3221356570 calling RpcBindingInqLocalClientPID.


Our suspicions so far:

We did registry changes to: SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}\0001 (to 0009) and set the value PnPCapabilities to 280 (disabling the checkbox "Allow the computer to turn off this device to save power") but not all network adapters support this checkbox so this may have had some side effects)



One curiosity: after the error we noticed that one of the 2 tagged networks had the wrong subnet on two nodes. This may have caused some of the failover role switches that occured on friday, but we're unsure about the reason since they were configured correctly some time before.

We've had a similar problem in our test environment after activating jumbo frames on the network interfaces. In that case we lost more and more filesystems after moving the file server role to another server. In the end all filesystems were lost and we reinstalled the whole cluster without enabling jumbo frames.

We now suspect that maybe two different network cards in the same network team may cause this problem.

What are your ideas? What may have caused the problem and how can we prevent this from happening again?

We could endure the loss of this virtual disk since it was only archive data and we have a backup, but we'd like to be able to fix this problem.

Best regards

Tobias Kolkmann


Viewing all 5654 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>