Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

Up But Isolated Cluster Node

$
0
0

I'm running Server 2016 fully patched in a 5 node cluster.  Hyper-V and S2D for a hyper-converged solution running a few hundred VMs.  Two days ago one of my nodes decided that it wanted to be cranky.  This caused the roles to rearrange on the systems and ended up putting one of my healthy nodes in the "Isolated" state.  <g class="gr_ gr_456 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" data-gr-id="456" id="456">Root</g> cause for the other node that went out to lunch is still unknown and is being researched separately.  However, this other healthy node has been stuck in an online but isolated state.  See screenshot.  I've seen plenty of examples where the node is offline and isolated, typically a network problem(network looks line. I have three separate NICs with separate switches/<g class="gr_ gr_3558 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" data-gr-id="3558" id="3558">vlans</g>/IP space).  I can live migrate VMs, my S2D storage is fully healthy on the cluster.  No issues using this node, but I don't like the "isolated" state.  I ran the cluster validation test for networking and it returns healthy.  No warnings or errors in the validation test.  Event logs show that the node when isolated, but in the same second I have a follow-up event that it's no longer isolated.  These events exist on all nodes in the cluster, so there is no reason why it should be isolated.  I'm sure if I rebooted this node(or even restarted the cluster service) that it would come back online as healthy, but another node in the cluster is having hardware issues, so that's not an option at the moment.  Any thoughts would be appreciated on how to remove the isolated state.  The end of the <g class="gr_ gr_5208 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" data-gr-id="5208" id="5208">powershell</g> command shows it all.  <g class="gr_ gr_5545 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" data-gr-id="5545" id="5545">State</g> is Up. StatusInformation is Isolated...



Upgrading to Windows 2019

$
0
0

Hi,

One of our Windows 2012 Hyper-V Cluster nodes went down and we reinstalled the windows 2019 on it. Is it possible to recreate the cluster with Windows 2019 and Windows 2012 or must be same OS?

What is the proper way to recreate it?

Thanks.

Update-ClusterFunctionalLevel : You do not have administrative privileges on the cluster.

$
0
0

I have just finished updating our cluster nodes from Windows server 2016 to 2019.

On 2 of our clusters there were no issues, but on a 3rd I am having the following issue.

When O go to update the functional level, I get this.

Update-ClusterFunctionalLevel : You do not have administrative privileges on the cluster.

I am a domain admin, in the local admins group on each node, and have full cluster access.

I can run any other administrative powershell cluster command without issue, and can fully admin via the GUI, but running the command Update-ClusterFunctionalLevel gives the no privileges error.

If I ask my other admin to run it, he gets the same.

If I create a net new AD account, assign it local admin on each node, and grant-cluster access full  , that account also get it.

I opened a MS support ticket, but after 10 days I have still yet to get a call back. I even called back in (Sev-B = 4 hours...) and they said, yes I am still in the queue - wth....)

Anyway, I am assuming it is likely a bad registry entry, or something messed up on an AD object perhaps, but not sure where to look.

Event ID 1069 when deleting a VM

$
0
0

Hi,

I have a Hyper-V cluster and use SCVMM to manage the VMs.  Whenever I delete a VM from SCVMM I get an error logged:

-------------------------------------------------------------------------------------------

Event ID: 1069

Source: FailoverClustering

Task Category: Resource Control Manager

Cluster resource 'SCVMM <vm name> Configuration' of type 'Virtual Machine Configuration' in clustered role 'SCVMM <vm name> Resources' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

-------------------------------------------------------------------------------------------

It does not seem to cause any issues but I may be wrong.

Does anybody know why these errors may be generated when deleting a VM?  Is it normal as I understand this event is a generic event.

Kind regards

File Share in Cluster Help

$
0
0
Good Evening, 

I am doing MCSA at the moment, and I am practising through VM's on Hyper-V. After clustering 2 servers and I came to the part to create a file share an error is popping up. I have looked through many forums and articles, I tried all help I found but the error still popped up. Can you please help? The below is the error. Thanks in advance.

WinRM cannot process the request. The following request with error code 0x8009030e occured while using Negotiate authentication. A specified logon session does not exist. It may already have been terminated.
This can occur if the provided credentials are not valid on the target server, or if the server identity could not be verified. If you trust the server identity, add the server name to the TrustedHosts list, and then retry the request. Use winrm.cmd to view or edit the TrustedHosts list. Note that computers in the TrustedHosts list might not be authenticated. For more information about how to edit the TrustedHosts list, run the following command: winrm help config.

Cluster join failure - [Schannel] Server: auth failed on server with error: 80090326

$
0
0

Same issue on 2 separate clusters (both have 2x Server 2016 nodes) after cluster aware update (which only done first node!) As CUA failed the second node never updated & that is why cluster still works!

Both have failed cluster:

Node '' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

Cluster node '' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls.


There is nothing wrong with witness disk, it is accessible fine, shows in Disk Management as Reserved (exactly as it should)

There are known issues with this update, but none states that it would kill Cluster!

In process of testing on clusterA I removed the Down node & tried to add it, cannot do it!

On clusterB, it is now one node working cluster and the second node cannot join. The cluster will work from either node (if I Shutdown both nodes & power them in different order)

I am able to ping the host names and FQDN's of both nodes and the DNS server from all servers



WSFC broken, please help diagnose

$
0
0

I have a 2016 WSFC with file server role. 2 Nodes in the cluster shared storage. We lost Power to Node2 which died, when bringing it back up it wont join the cluster (shows 'Down' in failover cluster manager). If I shut down the entire cluster completley and start it on Node2 first, Node2 runs the cluster fine but Node1 now wont join the cluster (shows 'Down')

As far as I can tell all connectivity seems fine, I've turned off windows firewall, the network between the two servers is working fine and no firewalls in between the two nodes. Other clusters are running on the same infrastructure.

The only hints in failover cluster manager are that the Network connection for Node2 shows as offline (the network is up and working has the allow traffic and management ticked, can ping, RDP etc.

When I shutdown then restart the entire cluster Node2 first, roles become reversed, Node1 now shows network as offline, information details or crytical events for network have no entries

Crytical Events for Node2 itself, when in down state show: Error 1653 Cluster node 'Node2' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls. - however im not convinvced this is actually the issue because of the below error messages:

The failover clustering log is as follows:

00000774.00001c4c::2018/05/15-16:48:50.659 INFO  [Schannel] Server: Negotiation is done, protocol: 10, security level: Sign00000774.00001c4c::2018/05/15-16:48:50.663 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 16100000774.00001c4c::2018/05/15-16:48:50.712 DBG   [Schannel] Server: ASC, sec: 90312, buf: 205900000774.00001c4c::2018/05/15-16:48:50.728 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 199200000774.00001c4c::2018/05/15-16:48:50.730 DBG   [Schannel] Server: ASC, sec: 0, buf: 5100000774.00001c4c::2018/05/15-16:48:50.730 DBG   [Schannel] Server: Receive, type: MSG_AUTH_PACKAGE::Synchronize, buf: 000000774.00001c4c::2018/05/15-16:48:50.730 INFO  [Schannel] Server: Security context exchanged for cluster00000774.00001c4c::2018/05/15-16:48:50.735 DBG   [Schannel] Client: ISC, sec: 90312, buf: 17800000774.00001c4c::2018/05/15-16:48:50.736 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 6000000774.00001c4c::2018/05/15-16:48:50.736 DBG   [Schannel] Client: ISC, sec: 90312, buf: 21000000774.00001c4c::2018/05/15-16:48:50.749 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 213300000774.00001c4c::2018/05/15-16:48:50.752 DBG   [Schannel] Client: ISC, sec: 90364, buf: 5800000774.00001c4c::2018/05/15-16:48:50.753 DBG   [Schannel] Client: ISC, sec: 90364, buf: 1400000774.00001c4c::2018/05/15-16:48:50.753 DBG   [Schannel] Client: ISC, sec: 90312, buf: 6100000774.00001c4c::2018/05/15-16:48:50.754 DBG   [Schannel] Client: Receive, type: MSG_AUTH_PACKAGE::Schannel, buf: 7500000774.00001c4c::2018/05/15-16:48:50.754 DBG   [Schannel] Client: ISC, sec: 0, buf: 000000774.00001c4c::2018/05/15-16:48:50.754 INFO  [Schannel] Client: Security context exchanged for netft00000774.00001c4c::2018/05/15-16:48:50.756 WARN  [ClRtl] Cannot open crypto container (error 2148073494). Giving up.00000774.00001c4c::2018/05/15-16:48:50.756 ERR   mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)'00000774.00001c4c::2018/05/15-16:48:50.756 WARN  mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed'00000774.00001c4c::2018/05/15-16:48:50.756 DBG   [CHANNEL 172.23.1.15:~56287~] Close().

specifically:

Server: Negotiation is done (aka they talked to eachother?)
[ClRtl] Cannot open crypto container (error 2148073494). Giving up. mscs_security::SchannelSecurityContext::AuthenticateAndAuthorize: (-2146893802)' because of 'ClRtlRetrieveServiceSecret(&secretBLOB)' mscs::ListenerWorker::operator (): HrError(0x80090016)' because of '[SV] Schannel Authentication or Authorization Failed'

I cant find many if any articles dealing with these messages, the only ones I can find, say to make sure permissions are correct on  %SystemRoot%\Users\All Users\Microsoft\Crypto\RSA\MachineKeys 

I did have to change some of the permissions on these files but still couldnt join the cluster. Other than that im struggling to find any actual issues (SMB access from node1 to node2 appears to be fine, smb access from node2 to node1 appears to be fine, dns appears to be working fine, file share whitness seems to be fine)

Finally the cluster vlaidations report shows these two errors as the only errors with the cluster

Validate disk Arbitration: Failed to release SCSI reservation on Test Disk 0 from node Node2.domain: Element not found.

Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node1.domain to the share on node Node2.domain. The network path was not found.

Validate CSV Settings: Failed to validate Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT). The connection was attempted with the Cluster Shared Volumes test user account, from node Node2.domain to the share on node Node1.domain. The network path was not found.

other errors from the event logs

ID5398 Cluster failed to start. The latest copy of cluster configuration data was not available within the set of nodes attempting to start the cluster. Changes to the cluster occurred while the set of nodes were not in membership and as a result were not able to receive configuration data updates. .Votes required to start cluster: 2 Votes available: 1Nodes with votes: Node1 Node2 Guidance:Attempt to start the cluster service on all nodes in the cluster so that nodes with the latest copy of the cluster configuration data can first form the cluster. The cluster will be able to start and the nodes will automatically obtain the updated cluster configuration data. If there are no nodes available with the latest copy of the cluster configuration data, run the 'Start-ClusterNode -FQ' Windows PowerShell cmdlet. Using the ForceQuorum (FQ) parameter will start the cluster service and mark this node's copy of the cluster configuration data to be authoritative.  Forcing quorum on a node with an outdated copy of the cluster database may result in cluster configuration changes that occurred while the node was not participating in the cluster to be lost.

ID4350 Cluster API call failed with error code: 0x80070046. Cluster API function: ClusterResourceTypeOpenEnum Arguments: hCluster: 4a398760 lpszResourceTypeName: Distributed Transaction Coordinator lpcchNodeName: 2

Lastly I built another Server node3 to see if I could join it to the cluster but this fails:

* The server 'Node3.domain' could not be added to the cluster. An error occurred while adding node 'Node3.domain' to cluster 'CLUS1'. Keyset does not exist

ive done the steps here with no joy, http://chrishayward.co.uk/2015/07/02/windows-server-2012-r2-add-cluster-node-cluster-service-keyset-does-not-exist/



[Solved] Cluster join failure - [Schannel] Server: auth failed on server with error: 80090326

$
0
0

Same issue on 2 separate clusters (both have 2x Server 2016 nodes) after cluster aware update (which only done first node!) As CUA failed the second node never updated & that is why cluster still works!

Both have failed cluster:

Node '' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

Cluster node '' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls.


There is nothing wrong with witness disk, it is accessible fine, shows in Disk Management as Reserved (exactly as it should)

There are known issues with this update, but none states that it would kill Cluster!

In process of testing on clusterA I removed the Down node & tried to add it, cannot do it!

On clusterB, it is now one node working cluster and the second node cannot join. The cluster will work from either node (if I Shutdown both nodes & power them in different order)

I am able to ping the host names and FQDN's of both nodes and the DNS server from all servers





NAS File Share as File Share Witness for Windows 2012 R2 Stretched Cluster

$
0
0

I've been unable to official documentation stating whether using a NAS file share (in my case, NetApp) as File Share Witness is supported for a Windows 2012 R2 Stretched Cluster. The cluster will support SQL 2012 AlwaysOn AGs. I've found unofficial blog posts from 3rd parties who have successfully configured a non-Windows file share as FSW, but need to know if this Microsoft supports this.

Thanks,

Denis McDowell

 

s2d 4 nodes cluster

$
0
0

dears,

i'm facing a serious issue and im stuck.

my deployment consists of 4 nodes 2019 deplyed in an s2d cluster. Validation is successfull.

all 4 nodes are up, same vswitch created on all the nodes via switch embedded teaming while enabling rdma.

the problem is with one node, each vm on this specific node is not accessible via rdp from client pcs and the application fails.

and the thing is if i move the vm to any of the other nodes it will work and get accessed via rdp. moreover, the host with the issue is accessible via rdp, its just any guest vm is not accessible. And ping is working, no firewall. Everything is working fine.

all users and servers in the same subnet, tracert works. I just don't know what is the issue

any help would be appreciated

best regards

Basic questions on failover clustering

$
0
0

We are running a 4 node File cluster running windows 2012 OS.

And run 6 prod server roles of 6 file servers

Question:

Say for Role FS6, i have 6 storage volumes. And one volume goes offline.

With the logical operator AND used and defined for all the cluster storage volumes of the file server FS6, the File server would go offline and the role FS6 would go in stopped state as shown in figure below:

Questions:

1) If one volume went off and it stopped the role as fig. above. What is expected of the remaining 5 volumes?
   Will they be accessible with the role in stopped state but the volumes show online.

2) A failover will be attempted as per policies on the role and the resources.
   With the polices for the role and resources set as per the following screenshots:

   What is expected?
   - Will a failover be attempted immediately or will it wait to start the resources on the same node for 15 minutes?
   - Will the resources move from current owner node a new node irrespective if the volume comes online or not?
   - Will it attempt to failover and start the resources, and if it fails it will stop trying after 3 attempts?

   - Will there be any disruption is accessing the available volumes during the failover?

Shall appreciate precise answers with above mentioned scenario.

Thanks,
Shailesh

2012 R2 RPC Errors When attempting to add to cluster

$
0
0

Hi,

Fresh build of 2012 R2 on a host, all updates applied and windows firewall is disabled completely. When attempting to add to cluster or even manage another Hyper-V host i get RPC unavailable errors. If i run a tnc IP -port 135 it`s listening but any WMI queries result in the RPC unavailable error.

Can anyone help me :)

NICs SPEED CHANGED AFTER FORMATTED

$
0
0

I have server works on win server 2012r with 4 NIC..
NIC0 = 10G speed
NIC1 = 10G speed
NIC2 = 1G speed
NIC3 = 1G speed
 when i formatted the server with HYPER-V SERVER 2019 .. All NICs are coming with same speed 1G WHY??
If that is normall with hyper-v server then what's about WINDOWS SERVER CORE 2019??

NOTE..

I DON'T TOUCH ANY UNDER INFRASTRUCTURE HARDWARE OR CABLE

window 2019 server error

$
0
0

I had brand new server just installed. But i get the server reboot with Kernel-Power event id 41

and bug check 1001. Please help

No communication on new "cluster only" network - how to troubleshoot?

$
0
0

Hi,

We've had a couple of Server 2016 vms working as File servers 1 and 2. To improve on best practices, I decided to add new vNICs to both VMs that were on a separate VLAN from the "cluster and client" traffic. The VLAN is wide open, with no gateway or DNS (it's only for cluster traffic), so I picked 192.168.11.1 for FS1 and 192.168.11.2 for FS2, both in 255.255.255.0.

The windows firewall rule for inbound and outbound UDP 1812 traffic rule is in place, but it's allow all, on any interface, within the network.

When I run cluster validation, I get:

Network interfaces FS1.ads.ssc.wisc.edu - Ethernet1 and FS2.ads.ssc.wisc.edu - Ethernet1 are on the same cluster network, yet address 192.168.11.2 is not reachable from 192.168.11.1 using UDP on port 3343.

I also notice that the nodes are unable to ping each other this new network.

What sort of troubleshooting can I do to determine why communication isn't happening on the network?

I already ran netstat -rn and see (from FS1 in this case)

Destination         Netmask               Gateway        Interface        Metric
192.168.11.0    255.255.255.0         On-link      192.168.11.1    271

in the routing table, so I think that is right.

I've never done this before, so I'm not sure how to proceed with further testing. Our VM admin has set up two VMs quickly in vSphere that only use this new VLAN, and he confirmed they can communicate with each other. So the problem seems to be with FS1 and FS2.


Disks from other cluster node visible when using get-disk on Windows Server 2016

$
0
0

Hello

For quite som time now, I have been trying to figure out why disks on other cluster nodes are listed when running Get-Disk (PowerShell) on Windows Server 2016 based cluster (15 node Exchange 2016 DAG).

No shared disks in the cluster (no fibre channel or iSCSI). Server "hardware" is VM-ware ESX and each server have their own disks.

The "remote" disks doesn't have a disk number as the local does.

We have a similar Windows Server 2012 (R1) / Exchange 2013 based cluster where this doesn't happen. And even though we have twice the disk count on these, Get-Disk and other disk related cmdLets are performing much faster.

I suspect that new Win 2016 feature is affecting performance very much. So I would like to disable it, if possible.

Does anyone know if that is doable?

Could UseClientAccessNetworksForSharedVolumes cluster setting have something to do with it?. It's 0 on our Windows 2012 cluster and 2 on our Windows 2016 cluster.

Thanks :)


Change File Share Witness Location

$
0
0

Hi, 

Currently we have configured FSW in Windows Server 2008 server, as part of EOL server, planning to migrate from 2008 to 2016, 

My Question is - What is the procedure to change FSW path?

1. Downtime required?

2. Change the path to all three nodes?

3. Any precheck?

Appreciate your assistance on this. 

File Share Witness Path Changes

$
0
0

Hi 

We are planning to change the FSW path on existing three node cluster, is required downtime?, if no what is the impact if i will change the path that time, also do we need to copy any olde config files to new share?

Appreciate your assistance

NLB Manager donot show the other host

$
0
0

Hello

a few months back, I installed and configured Windows NLB on a couple of exchange CAS servers and  everything seems to be working fine (still working fine). However, the other day when I opened NLB manger on each servers to find that one of the HOST is missing from the cluster. If I open NLB on server1, Server2 is not listed and viceversa. I tried to refresh the screen to no avail. I tried to re-add the missing host on to the cluster I get " The specified host is already part of this cluster". NLB is running in multicasrt mode seems to be working fine, i just cant see both servers. Is there any way I can rectify this?

I am running windows 2008R2 Sp1. (it could see it fine in the past not now)

hakim

How I an add the 3rd Host to my existing Hyper-v cluster ?

$
0
0

Existing environment is hyper-v 2 node cluster.

I want to add my 3rd host to my cluster.

Please help with the step by step procedure.


Viewing all 5654 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>