AlwaysOn Cluster reboot due to file share witness unavailability

April 24, 2015, 6:38 am

≫ Next: Failed Cluster Validation Wizard - Active Directory

≪ Previous: Shadow Copies on 2012 R2 File Server Cluster

Hi Team,

Anyone came across this scenario in AlwaysOn Availability Group (two node), file share witness times out and RHS terminate and cause the cluster node to reboot. File share witness is for continuous failover and if the resource is unavailable my expectation was that it should go offline and should not impact Server or Sql Server. But its rebooting the cluster node to rectify the issue.

Configuration

Windows Server 2012 R2 (VMs) - two node, file share witness (nfs)

Sql Server 2012 SP2

Errors

A component on the server did not respond in a timely fashion. This caused the cluster resource 'File Share Witness' (resource type 'File Share Witness', DLL 'clusres2.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.

The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue.

Thanks,

-SreejitG

↧

Failed Cluster Validation Wizard - Active Directory

April 29, 2015, 10:05 am

≫ Next: 2012 NLB Best Practice (Single vs Multiple NICs)?

≪ Previous: AlwaysOn Cluster reboot due to file share witness unavailability

Hi All,

I have a problem creating a cluster that has just got me stumped.

I've created two other clusters at two other sites with exact same hardware and configuration. All worked fine at those location.

For this third location, the cluster build is not going so well.

I have 2 x Dell server for a 2 node cluster, and a Dell SAN. There are iSCSI HBAs and switches for iSCSI and a flat network for the LAN and Management on another switch. The heartbeat is connected via a directly connected cable.

The nodes are running 2012 R2 Core, with latest patches (April 2015).

Firewall is disabled by Group Policy, both on servers and Domain Controller.

The Domain Controller at this site is on the local LAN.

On each future cluster node, the LAN cards can ping the Domain Controller by name and address (using -S to verify the source address). On the DC it can also ping each LAN interface, again both by name and IP. Heartbeat NICs can similarly ping each other.

Using a Management Server on a remote subnet (Domain Account, local admin of all concerned servers, except the DC, and having read/write on the target OU), the Cluster Validation Wizard fails, on one node only.

I'm getting an error message on "System Configuration\Validate Active Directory Configuration" of:

Connectivity to a writable domain controller from node DEN1NTHV02.mycorp.net could not be determined because of this error: Could not get domain controller name from machine DEN1NTHV02.
Node(s) DEN1NTHV01.mycorp.net can reach a writable domain controller.
Node(s) DEN1NTHV02.mycorp.net cannot reach a writable domain controller. Please check connectivity of these nodes to the domain controllers.

The computer accounts are in the same OU, and both are enabled and have no (seeming) other problems.

So, (1) does anyone have any idea, or (2) where can I find the verbose output text of the failed test?

↧

2012 NLB Best Practice (Single vs Multiple NICs)?

April 27, 2015, 7:07 am

≫ Next: how to enable the CSV vss shadow copy

≪ Previous: Failed Cluster Validation Wizard - Active Directory

Our environment has used an NLB configuration with two NICs for years. One NIC for the host itself and one for the NLB. We have also been running the NLB in multicast mode. Starting with 2008, we began adding the cluster's MAC address as an ARP entry on our layer three switch. Each server participating in the NLB is on VMware.

Can someone advise what the best procedure is for handling NLB in this day? Although initial tests with one NIC seem to be working, I do notice that we get a popup warning on the participant servers when launching NLB manager "Running NLB Manager on a system with all networks bound to NLB might not work as expected"... if they are set to run in unicast mode.

With that said, should we not be running multicast? Will that present problems down the road?

↧

how to enable the CSV vss shadow copy

April 26, 2015, 8:45 am

≫ Next: 2012r2 cluster create err

≪ Previous: 2012 NLB Best Practice (Single vs Multiple NICs)?

2012r2 cluster, i found witness disk properties i can enable shadow copy, but when i switch the CSV properties disk i can not see this option, does the cluster CSV support the shadow ?

↧

2012r2 cluster create err

April 26, 2015, 7:55 am

≫ Next: Failed to put node in node maintenance mode. Details: Microsoft.ClusterAwareUpdating.ClusterUpdateException: Could not suspend cluster node

≪ Previous: how to enable the CSV vss shadow copy

i select a exist vhd on CSV and attach to a vm, but vm can not be created amn show this error

There was a failure configuring the virtual machine role for 'test'.
There was an error retrieving the unique identifier for the Cluster Shared Volume that contains the path 'S:\ vm disk\08r2sp1\test'.

The parameter is incorrect

↧

Failed to put node in node maintenance mode. Details: Microsoft.ClusterAwareUpdating.ClusterUpdateException: Could not suspend cluster node

March 12, 2015, 5:43 am

≫ Next: CAU Cluster Role problem

≪ Previous: 2012r2 cluster create err

CAU was working for some time (I was really surprised that it did actually work without any hitch, being it MS product...)

But it is back to its old tricks:

Failed to put node VHOST01 in node maintenance mode. Details: Microsoft.ClusterAwareUpdating.ClusterUpdateException: Could not suspend cluster node "VHOST01".
at MS.Internal.ClusterAwareUpdating.Util.CheckPshError(PowerShell shell, MulticulturalString exceptionMessage)
at MS.Internal.ClusterAwareUpdating.FailoverClusterImpl.PutIntoMaintenanceMode(String nodeName, ICauPluginCallbackBase callback, CancellationToken cancelToken, Boolean force)

After which it seems that it just rebooted the host with all VMs crashing & restarting on the other host in the cluster and the update finishing on both hosts with Success (I would call it otherwise)

Nice, really nice...

Anybody has any idea what that "warning" means

↧

CAU Cluster Role problem

April 30, 2015, 4:22 am

≫ Next: quorum failure scenario question

≪ Previous: Failed to put node in node maintenance mode. Details: Microsoft.ClusterAwareUpdating.ClusterUpdateException: Could not suspend cluster node

I've got problem with run CAU on my Hyper-V cluster. It's look like one cluster resource is missing.

get-CauClusterRole -ClusterName thvc1

get-CauClusterRole : Could not get the state of resource "CAUTHVC18z2Resource": (Win32Exception) The cluster resource could not be found
At line:1 char:1
+ get-CauClusterRole -ClusterName thvc1

When I checked:

Get-ClusterResource -Cluster thvc1

Name                                                                       State
----                                                                       -----
Adres IP klastra                                                           Online
Monitor udostępniania plików                                               Online
Nazwa klastra                                                              Online
Scale-Out File Server (\\THVC1CUA)                                         Online
THVC1CUA                                                                   Online
Virtual Machine Configuration W7-Pc1                                       Online
Virtual Machine Configuration W7-plwropc301                                Online
Virtual Machine Configuration W8-plwropc302                                Online
Virtual Machine Configuration WXP-pc1                                      Online
Virtual Machine Configuration WXP-plwropc300                               Online
Virtual Machine W7-Pc1                                                     Online
Virtual Machine W7-plwropc301                                              Offline
Virtual Machine W8-plwropc302                                              Offline
Virtual Machine WXP-pc1                                                    Offline
Virtual Machine WXP-plwropc300                                             Offline

So CAUTHVC18z2Resource is missing. So I can remove this role usingRemove-CauClusterRole.

How to clean this role and add it again.

Thank you very much for help.

Kind Regards Tomasz

↧

quorum failure scenario question

April 30, 2015, 9:49 am

≫ Next: 2012 R2 CAU keeps installing the same update.

≪ Previous: CAU Cluster Role problem

Hello TechNet friends,

I have a scenario that happened yesterday that leaves me stumped and I am not sure in which direction to go.

2-node active/passive 2008R2 file cluster (Node 1 & Node 2)
Nodes are vmguests on vsphere 5.5
path selection is round-robin
quorum node/disk majority (quorum disk is SAN...all drives are SAN in fact)
Node 1 owns cluster resources

Our VM environment re-balanced itself in the wee hours of the morning. Upon initiation of migration of Node 1 to a different host, the VM system reported that there was no heartbeat coming from node 1. This appears to be because the virtual switch used in VMware listed a different "Observable IP range" outside that of the heartbeat IP. We have noticed that the observable IP range change and apparently that is expected behavior due to broadcast packets being received and should not cause alarm. The guest migration then occurred.

Seconds later, the MS cluster reported the cluster service failed to update the cluster configuration on the witness disk. The witness disk then failed and dropped from the cluster. The cluster remained up with no errors being reported.

The newly migrated Node 1 showed all green in terms of cluster and cluster resources and only showed this Witness disk error in the logs. It wasn't until I was notified that the application could not reach its cluster resources did I drill down into the cluster and notice that the attached SAN drives only showed a unique Identifier # and no longer had a drive letter. The drive also showed 0 bytes. I had to reboot Node 1 in order to restore connectivity.

So..I think I have a couple of questions:

A) Did the intermittent loss of a heartbeat during the migration cause the cluster service to fail to update the cluster config on the witness disk?

B) Why does A matter if the original cluster config is kept c:\windows\cluster?

c) You can lose the witness disk and be ok, why did Node 1 all of a sudden think it had the cluster resources but could not provide a drive letter?

thank you.

↧

2012 R2 CAU keeps installing the same update.

April 23, 2015, 9:03 am

≫ Next: File copy speeds to CSV vs non-CSV

≪ Previous: quorum failure scenario question

I am trying to CAU a 12 node cluster with WIN2012R2.

CAU keeps installing the same update on same node for several KB2461484.

I tried several times to update the same node through CAU and windows Update along with rebooting the node but nothing changes.

When I run CAU it always picks same node and tries to install the same update.

Any ideas?

↧

File copy speeds to CSV vs non-CSV

April 24, 2015, 11:43 am

≫ Next: what you guys use for production vm storage?

≪ Previous: 2012 R2 CAU keeps installing the same update.

I'm working on bringing up a 2012 R2 cluster and doing a basic test. In this cluster, I have two adapters for iSCSI traffic, one for network traffic, and one for the heartbeat. Cluster node has all the current updates on it. Everything is set up correctly as far as I can see. I'm taking a folder with 1GB of random files in it and copying it from the C: drive of a node to an iSCSI LUN. If I have the LUN set up as a non-CSV disk, the copy happens about three time faster than if I have it set up as a CSV disk. All I'm doing is using FCM to change the disk from CSV to non-CSV (right-click, Remove from CSV, right-click, Add to CSV). I can swap it back and forth and each time the copy process is about three time slower when it's a CSV. Am I missing something here? I've been through all the usual stuff with regard to the iSCSI adapters, MPIO, drivers, etc. But I don't think that would have anything to do with this anyway. The disk is accessed the same with regard to all that whether it's CSV or not, unless I'm missing something. Right now, I only have a single node configured in the cluster, so it's definitely not anything to do with the CSV being in redirected mode.

I'm not trying to establish any particular transfer speed, I know file transfers are different than actual workloads and performance tools like iometer when it comes to actual numbers. But it seems to me like the transfers should be close to the same whether the disk is a CSV or not, since I'm not changing anything else.

↧

what you guys use for production vm storage?

April 30, 2015, 6:25 am

≫ Next: NLB Multi-Site

≪ Previous: File copy speeds to CSV vs non-CSV

i mean explicitly not labs / homes but something you roll in the office you don't own yourself :)

would you put into production some free solution with allowed commercial use (no eula violation! ) but with community only / limited vendor support?

what would be a game changer for you? say going freenas (free) -> truenas (paid) upgrade?

tnx!! :)

↧

NLB Multi-Site

July 6, 2010, 7:06 am

≫ Next: Witness Client failed to Register

≪ Previous: what you guys use for production vm storage?

Is it possible to use NLB through a router in two different datacenter?

↧

Witness Client failed to Register

March 26, 2015, 6:58 am

≫ Next: Some clients don't connect to clustered file server when moving role.

≪ Previous: NLB Multi-Site

I have a recently built pair of W2012 R2 servers with all of the applicable updates applied via WSUS. They are clustered servers but are not using shared storage. Instead they are using Vision Solutions' "High Availability for Windows" to replicate several drives. Also installed is SQL 2012 (clustered) and all available updates. I have used this solution several times previously on other clusters. The servers are performing as expected but are not yet in production. A few days ago I began to see the following notifications approximately every 20 seconds on the "owning" node (PCSCALEA) of the cluster:

Log Name:      WitnessClientAdmin
Source:        Microsoft-Windows-SMBWitnessClient
Date:          3/26/2015 9:54:23 AM
Event ID:      8
Task Category: None
Level:         Error
Keywords:
User:          NETWORK SERVICE
Computer:      PCScaleA.rms.org
Description:
Witness Client failed to register with Witness Server PCSCALEB for notification on NetName\\Pcscale with error (The parameter is incorrect.)
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
    <Provider Name="Microsoft-Windows-SMBWitnessClient" Guid="{32254F6C-AA33-46F0-A5E3-1CBCC74BF683}" />
    <EventID>8</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2015-03-26T13:54:23.951194200Z" />
    <EventRecordID>114860</EventRecordID>
    <Correlation />
    <Execution ProcessID="1624" ThreadID="9612" />
    <Channel>WitnessClientAdmin</Channel>
    <Computer>PCScaleA.rms.org</Computer>
    <Security UserID="S-1-5-20" />
</System>
<EventData>
    <Data Name="WitnessServerIP">PCSCALEB</Data>
    <Data Name="NetName">Pcscale</Data>
    <Data Name="Error">87</Data>
</EventData>
</Event>

The following errors appear approximately every 20 seconds on the "non-owning node" (PCSCALEB):

Log Name:      WitnessServiceAdmin
Source:        Microsoft-Windows-SMBWitnessService
Date:          3/26/2015 9:51:43 AM
Event ID:      5
Task Category: None
Level:         Error
Keywords:
User:          SYSTEM
Computer:      PCScaleB.rms.org
Description:
Witness Service registration request from Witness Client (PCSCALEA.RMS.ORG) for NetName\\PCSCALE failed with error (The parameter is incorrect.)
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
    <Provider Name="Microsoft-Windows-SMBWitnessService" Guid="{CE704B50-B105-4BC8-A24F-1792C0401C2A}" />
    <EventID>5</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2015-03-26T13:51:43.906614200Z" />
    <EventRecordID>153093</EventRecordID>
    <Correlation />
    <Execution ProcessID="5252" ThreadID="5368" />
    <Channel>WitnessServiceAdmin</Channel>
    <Computer>PCScaleB.rms.org</Computer>
    <Security UserID="S-1-5-18" />
</System>
<EventData>
    <Data Name="ClientName">PCSCALEA.RMS.ORG</Data>
    <Data Name="NetName">PCSCALE</Data>
    <Data Name="ErrorCode">87</Data>
</EventData>
</Event>

Any ideas? Any suggestions?

Thanks,

Ryan

↧

Some clients don't connect to clustered file server when moving role.

May 1, 2015, 1:11 pm

≫ Next: Bitlocker protected cluster disks fail to mount

≪ Previous: Witness Client failed to Register

Hi guys, I have a two (virtual) server cluster which has a File Server role installed on it. The cluster passes the validation in all areas. For a long time I believed I had set it up incorrectly because my test machine would not be able to see the file shares after the role was moved from one node to another.

However, I did more testing and realized that it's a client issue. Some computers can see the file shares and ping the file server just fine, but others don't work when the role is moved. Perhaps this screenshot will explain it better:

http://imgur.com/QFdsVh8

The client machines have the same DNS servers, and the cluster has a static IP. Any thoughts?

Thanks.

↧

Bitlocker protected cluster disks fail to mount

May 3, 2015, 1:51 am

≫ Next: ms-cluster-net 3343 package showing on LAN network

≪ Previous: Some clients don't connect to clustered file server when moving role.

Hello

I am trying to Bitlocker protect some CSVs.

I am following the instructions here:

https://technet.microsoft.com/en-gb/library/dn383585.aspx

I can format and prepare the disk, and enable Bitlocker sucesfully. The disk is accessible on the host and I can lock and unlock the drive (with the recovery key) without an issue. I have correctly added the CNO to the list of protectors for the volume.

As soon as I add the disk to the cluster however, the disk fails to come on line in the cluster with the following error:

The system cannot open the device or file specified. Error Code 0x8007006e.

I am using Windows Server 2012 R2 (Core running Hyper-V). The hosts are connected to a Dell Equallogic PS4100 iSCSI array. I have tried with both thin and thick provisioned SAN volumes.

Many thanks

Ben

↧

ms-cluster-net 3343 package showing on LAN network

April 29, 2015, 3:03 am

≫ Next: Switching Windows Server 2012 R2 Datacenter Full to Core Edition after HyperV Cluster setup

≪ Previous: Bitlocker protected cluster disks fail to mount

Hi All,

On My Exchange 2013 environment on DAG failover cluster i have a little problem with package showing on LAN network.

Exchange is a VM on hyper-v host.

Hyper-V host have directly connected network card between server's point-to-point (witthout switch).

On this network connection's i created DAG with private IP address 192.168.x.x/24.

All communication, sending package and received should be going on this network only, but my network team saying, they have on wireshark on LAN segment package from this private network. Wireshark show DEST and SOURCE are 192.168.x.x.

On cluster DAG network configuration i have selected "cluster Only".

Windows Server 2012 R2.

How can i resolve this issue?

BR/Lukas

↧

Switching Windows Server 2012 R2 Datacenter Full to Core Edition after HyperV Cluster setup

May 3, 2015, 1:17 am

≫ Next: NLB cluster in HyperV

≪ Previous: ms-cluster-net 3343 package showing on LAN network

Hello There,

I am building two node HyperV Windows 2012 R2 Datacenter Full edition cluster but after the cluster is ready and functional i want to switch from Full Edition to Core Edition.

Please suggest if there is any issue doing this and if this works.

Thanks,

Maqsood

Maqsood Mohammed Senior Systems Engineer MCITP-Enterprise Admin & ITILv3 Foundation Certified

↧

NLB cluster in HyperV

May 4, 2015, 6:48 am

≫ Next: Event ID 21502 - VM restore failed after failing over

≪ Previous: Switching Windows Server 2012 R2 Datacenter Full to Core Edition after HyperV Cluster setup

Hi guys, i have a client with an interesting problem.

They currently have two 2008 R2 machines running i hyper-V performing load balancing for exchange 2010 cas. Each machine has its own physical NIC, but that NIC is shared, so essential the LAN and the NLB net shares the physical NIC.

So moving to the present situation, they moved one of the machines to a new hyper-v cluster, but now it doesn't have its own physical NIC, it now has two virtual NICs, the LAN and NLB, when they try to add the machine to the cluster it gets stuck on converged, also when trying to add it to the cluster, it only shows one NIC, (the LAN), after some troubleshooting i discovered you have to uncheck the NLB checkbox on the NLB nic properties in order for it to show up but it still wont join the cluster.

Hope that wasnt too confusing, please let me know if you need clarification. So question, is this normal behaviour for HyperV and NLB?

↧

Event ID 21502 - VM restore failed after failing over

May 10, 2011, 11:32 pm

≫ Next: Techies, Ways to find LUN IDs Windows 2012

≪ Previous: NLB cluster in HyperV

Hi,

I have a 2 node windows 2008 Datacenter R2 +sp1 failover cluster, the VHDs are located on a CSV.

I rebooted one node of the cluster, so all VMs failed over to node2... except 1.

It gave me this event

Event ID 21502

'Virtual Machine Apps Server' failed to start.

'Apps Server' failed to restore. (Virtual machine ID FF5FB4FC-B73F-47E2-AA68-153283CA5CB8)

'Apps Server' Microsoft Synthetic SCSI Controller (Instance ID {A161C8AF-B7D6-44B1-8CA9-E32CCA8613A3}): Failed to restore with Error 'General access denied error' (0x80070005). (Virtual machine ID FF5FB4FC-B73F-47E2-AA68-153283CA5CB8)

'Apps Server': Hyper-V Virtual Machine Management service Account does not have permission to open attachment '\\?\mpio#disk&ven_hp&prod_hsv200&rev_5000#1&7f6ac24&0&3630303530384234303030363841433230303030463030303032394630303030#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'. Error: 'General access denied error' (0x80070005). (Virtual machine ID FF5FB4FC-B73F-47E2-AA68-153283CA5CB8)

Any ideas on what this is and how to resolve?

Phil

↧

Techies, Ways to find LUN IDs Windows 2012

May 5, 2015, 9:51 pm

≫ Next: Add Node to Cluster - Keyset does not exist

≪ Previous: Event ID 21502 - VM restore failed after failing over

Hi Guys,

I'm able to get the LUN ID for 2008 servers via FCINFO & registry.

Can anyone tell me how do we find the LUN ID for Windows 2012 servers?

↧