Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

Cluster disks showing reserved on both nodes.

$
0
0

Hello Folks,

We have configured file services on the Windows Storage server 2012 R2 cluster. Team has recently restarted both nodes at the same time and after that my cluster went down. I am not able to open cluster as cluster service is not stable. It is continuously restarting due to loss of quorum disk. When i checked disk management on both nodes, I found that all cluster disks are reserved. I tried to remove reservation with the help of following...

1. Clear-clusterreservation: Command executed sucessfully but still disk is showing reserverd.

2. Cluster node Server name /clear

3. Tried to remove attacheddisk registry settings...after server reboot that entry is coming automatically. 

 

Please help me to remove cluster disk reservation. 



Thanks, Chinmay.


Can't configure Cluster Aware Updating

$
0
0

I'm trying to install the Cluster Aware Updating service. But I'm not be able to fix this error:

“Unable to create the CAU clustered role because a Network Name resource could not be created. This can occur if a computer account (virtual computer object) for the role could not be created in the domain. Check the event log for more information. If the cluster name account does not have permissions to create the object, you can pre-stage a computer account in Active Directory. Then, use the Add-CauClusterRole Windows PowerShell cmdlet with the VirtualComputerObjectName parameter to create the CAU clustered role. For more information about pre-staging computer accounts, see http://go.microsoft.com/fwlink/p/?LinkId=237624.”

I hava prestage a computeraccount: CAU-ATC
The computeraccount of the cluster is: ATC-CLUSTER

I give the ATC-CLUSTER account permission to create computeraccounts in the OU of the cluster.

But still, I get this error.

The error.

The OU with the accounts.

Enter the computer object...

The persmissions.

Random Cluster Failures

$
0
0

Hey guys, 

Really need a hand here, I have a production cluster with 2 R630s 256g RAM, 3 R610s 192g RAM 1 that is a hot spare on 2012R2 Data Center. Recently I updated the NICS with Microsoft drivers (intel ethernet server adapter x520-2 driver 2012r2 data center) and shortly after starting having a lot of VMs randomly failing on random hosts, a few at a time.

160VMs that average 30-40 VMs per host.

After updates, re-installs of actual intel drivers, pushing out VM hardware re-configurations, i'd finally realized a huge issue. The driver update cut the VMQ ports back to the default 32. 

Reconfigured all of them back to 64 and for a few days i had no issues and was sure I had found the issue.

Came in this morning to find out over the weekend there was another 15 reboots.

So far the only commonality I've found is that this has only happened to our Gen 1 systems (we have 90 so far 51 have had reboots)

Here's a snipit from the cluster log around a VM failure:

0000117c.00002cfc::2016/12/19-07:34:57.705 INFO  [RHS] Resource Virtual Machine Configuration <VM NAME> called SetResourceLockedMode. LockedModeEnabled0, LockedModeReason0.
00000d8c.000029d0::2016/12/19-07:34:57.705 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine Configuration <VM NAME>', gen(0) result 0/0.
00000d8c.000029d0::2016/12/19-07:34:57.705 INFO  [RCM] Virtual Machine Configuration epcr-harvardil: Flags 1 removed from StatusInformation. New StatusInformation 0
0000117c.00002cfc::2016/12/19-07:34:57.705 INFO  [RHS] Resource Virtual Machine <VM NAME> called SetResourceLockedMode. LockedModeEnabled0, LockedModeReason0.
00000d8c.000029d0::2016/12/19-07:34:57.705 INFO  [RCM] <VM NAME>: Removed Flags 1 from StatusInformation. New StatusInformation 0
0000117c.00002cfc::2016/12/19-07:34:57.705 INFO  [RES] Virtual Machine <Virtual Machine <VM NAME>>: Current state 'Terminated', event 'VmStopped'
00000d8c.000029d0::2016/12/19-07:34:57.705 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine <VM NAME>', gen(3) result 0/0.
00000d8c.00000944::2016/12/19-07:34:57.705 INFO  [GUM] Node 3: executing request locally, gumId:71035, my action: /dm/update, # of updates: 1
00000d8c.00000f90::2016/12/19-07:34:57.705 INFO  [DM] Starting replica transaction, paxos: 460:460:576650, smartPtr: HDL( 2c83f5f2b0 ), internalPtr: HDL( 2c85294340 )
00000d8c.00000f90::2016/12/19-07:34:57.720 INFO  [DM] Finished replica transaction, paxos: 460:460:576650, smartPtr: HDL( 2c83f5f2b0 ), internalPtr: HDL( 2c85294340 ), status: 0
00000d8c.00000944::2016/12/19-07:34:57.720 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine <VM NAME>', gen(3) result 0/0.

Logs are also littered with these SQL errors which was what eventually led me to updating the hardware configurations of the VMs:

00000af0.0000239c::2016/12/19-03:22:41.113 ERR   [RHS] s_RhsRpcCreateResType: (126)' because of 'Error loading resource DLL fssres.dll.'
00000cec.000006f8::2016/12/19-03:22:41.113 INFO  [RCM] result of first load attempt for type SQL Server FILESTREAM Share: 126
000014e0.000026c0::2016/12/19-03:22:41.129 INFO  [RES] Physical Disk: HarddiskpIsPartitionHidden: device \Device\Harddisk2\ClusterPartition2 0
00000af0.0000239c::2016/12/19-03:22:41.238 ERR   [RHS] s_RhsRpcCreateResType: (126)' because of 'Error loading resource DLL hadrres.dll.'
00000cec.00001e88::2016/12/19-03:22:41.238 INFO  [RCM] result of first load attempt for type SQL Server Availability Group: 126
00000af0.0000239c::2016/12/19-03:22:41.254 ERR   [RHS] s_RhsRpcCreateResType: (126)' because of 'Error loading resource DLL fssres.dll.'
00000cec.00001e88::2016/12/19-03:22:41.254 INFO  [RCM] result of first load attempt for type SQL Server FILESTREAM Share: 126

Any Ideas???

Error: The computer is joined to cluster when creating the Cluster

$
0
0

Hello Guys,

I have created a cluster to configure Hyper-V for 2 Nods, everything was greate and works perfectly, next day the storage hang and the cluster didn't work any more, I have destroyed the cluster the removed the cluster feature from both nods, deleted the cluster-computer from AD and the deleted the storage. after we fixed the storage, I have reconnect the storage, installed the cluster service on both nods, then I have validate the configuration and I had everything green 100%.

while creating the cluster, I faced an issue Unable "to successfully cleanup" I kept trying and removed the anti-virus, restarted the servers manytime, then I ended up to have another error, direclty when I add the server name on the creat cluster wizard, its telling me that the computer I'm adding is joined to cluster.

I think I need to do some cleaning to the previous cluster, can I have some help here ?

Regards..

Nour


Nour

Sharing entire cluster volume

$
0
0

Hi, 

I am trying to share the entire cluster volume in one shot instead of individually creating a share for each subfolder under the volume, is this possible?



Hyper-V Server 2016 with Intel Core2 Duo E7500 CPU

$
0
0

I just performed a rolling cluster upgrade and everything seemed to go well. All the nodes were successfully upgraded after passing "SLAT" testing per the documentation. Everything is working except for 2 of my nodes will not start any VMs.  Storage and VMs can be moved onto the nodes but when you try to start the VM you receive this message:

'Virtual Machine MYMACHINE' failed to start.
'MYMACHINE' failed to start. (Virtual machine ID 5FBD1590-7B64-4972-ADAD-E3D578B35349)
Virtual machine 'MYMACHINE' could not be started because the hypervisor is not running (Virtual machine ID 5FBD1590-7B64-4972-ADAD-E3D578B35349). The following actions may help you resolve the problem: 1) Verify that the processor of the physical computer has a supported version of hardware-assisted virtualization. 2) Verify that hardware-assisted virtualization and hardware-assisted data execution protection are enabled in the BIOS of the physical computer.  (If you edit the BIOS to enable either setting, you must turn off the power to the physical computer and then turn it back on.  Resetting the physical computer is not sufficient.) 3) If you have made changes to the Boot Configuration Data store, review these changes to ensure that the hypervisor is configured to launch automatically.

I have verified that all of these options are enabled in the BIOS.

Has anyone else had any success getting a VM to fire up on a Core 2 Duo PC with Hyper-V Server 2016 installed on it?

These machines worked fine with Hyper-V Server 2012 R2

Node failure in S2D Hyperconverged cluster

$
0
0
The data within a S2D cluster is reilient to a node failure, but what happens with the VM's that were running on the failed node?
Are they relaunched automatically on the remaining nodes?

Windows Failover cluster between Physical and Virtual nodes

$
0
0

Dear Team,

One of our customer wants to build a SQL Cluster where 1<sup>st</sup> node is Physical server and 2<sup>nd</sup> node will be VM running on VMware ESX 6.0.

As per my understanding MS supports this type of configuration in Windows 2012.

During Failover Cluster Validation – it failed at validating “MS MPIO based disks”

We were able to continue to cluster implementing by skipping this test, since we know these 2 nodes are having different version / type of MPIO. ( physical & Virtual )

We would like to know should It be any major issue or we can just continue using this setup, as this is going to be one of the critical database server in production.

Please provide your best supported configuration document from Microsoft.

Thanks,

ABUL


Setup New Windows 2012 R2/2016 Server As Domain Controller and Clustering

$
0
0

Actual Setting

1. Windows 2008 R2 Servers Work Group running as a Remote Desktop Services (RDS) Server (or old name: terminal services) giving remote offices access to a Medical Billing Apps.

2. SQL Server 2008 R2 Database

3. Application (1 Main Medical Billing Application)

4. About 100 users with 70 workstations

5. (3) Remote offices (remote in to the RDP server using Remote Desktop Services(RDS) to access the Main medical billing application)

6. The 2 Servers are located in the main office and the rest of the users are located in 3 difference remote offices

7. Workstations in the remote office and main office running Windows 7 Pro and Windows 10 Pro

Issues: users have been complaining with system slowness and needs to retire Win2k8 R2 Server

Propose for New Scenario with New Servers

1. Dell Servers ( 2 PowerEdge T330 Servers and 2 PowerEdge T630 Servers)

2. OS: Windows Server 2016

3. Database: SQL Server 2014/2016

New Planning For the Project

1. Use 1 of T330 server as a primary domain controller

2. Use the other T330 as a backup domain controller

3. Setup the 2 T630s as a Cluster host for Hyper-V VM to host the Medical Billing App and SLQ Server database

4. Use a VM or 2 VMs as RDS/Terminal Server for Medical Billing Apps for the remote offices to access the Medical Billing Apps

I need some help here with the above new proposal setup. Money is tight and I need to do this in the most efficient and financial way possible to save time and money.

a.) The servers come with onboard SATA RAID, should I use the onboard RAID or should I purchase external RAID hardware Controller?

b.) What is the most efficient way to setup these servers that provide flawless remote connection for the remote office users? NOTES: Remember, the Medical Billing Software is very expensive, therefore, it must be installed the same way as in the previous (actual environment) settings above on 1 Server and share via terminal services (Remote Desktop Services, RDS)

c.) How about DirectAccess vs. VPN for the remote offices? Is DirectAccess a feasible solution over VPN?

d.) What is the best way possible to setup this new system as mentioned above?

I look forward to read your input soon!!!

Thanks 

Cluster Storage Disks vs. Pools

$
0
0
I'm setting up a Hyper-v Failover cluster for the first time and am unsure of when and why to put my disks into a pool or just create disks. I have (2) LUNs on my DAS. One RAID 10 with 15K drives in it for SQL and another RAID 10 with 7.2K drives for general storage. I think creating two disks and not using a pool makes the most sense. However, I'm unsure of what circumstance using a pool would be better.

Delete and recreate bitlocker-encrypted clustered file shares?

$
0
0

Hi all - I have a Server 2012 Cluster connecting to a SAN with Basic Disks shared out to 2 different Clustered Shared Volumes.

However, I would like to extend one of the 4TB drives to 6TB (only supported with Dynamic Disks, AFAIK, which in Server 2012 are only supported with an add-on from Symantec which I won't be able to purchase, AFAIK), and I would like to install iSCSI on the Cluster.  The CSV's are bitlocker-protected.

What's the best way to go about this?  Is it as simple as removing the cluster roles, installing the iSCSI services (which can only be set for Server 2012 Clusters at creation - it just won't work trying to install and configure after creating the Cluster)  and reinstalling the Cluster roles with the same names with the same nodes, and all the sub-shares/permissions will be intact?  I think that I can add the storage to the CSV Disk pool if it doesn't pick up the extra 2TB that have been provisioned on the SAN side, but even that seems like it could go sideways with bitlocker.

Thanks!


-Ken

CSV access stopped working on one cluster node

$
0
0
Have a 2008 R2 two node Hyper-V cluster (server core installation) with a HP Lefthand P4500 SAN (attached over iSCSI and MPIO). 

The cluster has worked for years, but since two days, one node doesn't work anymore. All VM's and all CSV's are running / attached to the working node. After rebooting the non working node, evreything seems to work. But when trying to migrate a VM to the non working node, the migration fails, and a lot of Cluster Events cluster 5120 STATUS_CONNECTION_DISCONNECTED(c000020c), "All I/O will temporarily be queued until a path to the volume is reestablished." are logged on the non working node. 

Have done a lot of troubleshooting, without success:
- verified that there is no HW failure.
- verified networking, SMB access (https://support.microsoft.com/en-us/kb/2008795) between the two nodes.
- verified and compared the MPIO / iSCSI configuration on both nodes (mpclaim -v conf.txt)
- Have upgraded all HP drivers on the non working node, and installed all available Windows updates, including the most recent Nov. 2016 cumulative update rollup
- Have verified that all HF according https://buildwindows.wordpress.com/2012/12/04/windows-hangs-when-accessing-a-cluster-shared-volume/ are installed on the non working node.
- Have run all cluster validation tests. There are no errors or warnings. However, the storage is not tested, because we can not take the CSV's offline.

What we haven't done yet is the upgrade of the HP P4500 SAN SW (Version 11.5 is installed, 12.6 is available).

Have the following questions:

1. Does anyone knows a solution to find the cause of the CSV access problems? There is a blog that describes CSV diagnostics, https://blogs.msdn.microsoft.com/clustering/2014/03/13/cluster-shared-volume-diagnostics/, but all these powershell cmdlets are not available for 2008 R2. And https://blogs.technet.microsoft.com/askcore/2010/12/16/troubleshooting-redirected-access-on-a-cluster-shared-volume-csv/ didn't help us.

2. We consider to reinstall the non working cluster node from scratch. Is the assuption correct that we can evict the non working cluster node, because the Quorum is up and running on the working cluster node? That the VM's continue to run even when the "cluster" has only one running node?

Thank you in advance for any help
Franz


Storage spaces direct - STATUS_IO_TIMEOUT

$
0
0

Hi Everyone,

I configured cluster of 4 Supermicro X10DRIservers to test storage spaces direct. Supermicro server configuration:

- 2 x Intel Xeon E5-2640v3 CPU
- 10 x 32GB 2133MHz DDR4 RAM
- 2 x SSD 400GB SATA Intel S3610
- 2 x HDD 2TB SATA Seagate ST2000NX0253
- 6 x SSD 800GB SATA Intel S3610
- Intel X520-DA2 E10G42BTDA Network adapter

All was fine, cluster and S2D were working without any problems. But after a month, I replaced Intel X520-DA2 E10G42BTDA Network adapter on Mellanox ConnectX-3 Pro Ethernet Adapter. After that, I started receiving Warnings in Cluster Events log:

"Cluster Shared Volume 'VD1' ('Cluster Virtual Disk (VD1)') has entered a paused state because of 'STATUS_IO_TIMEOUT(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished."

I checked health of physical disks - they fine, also I checked network and found I had about 500K Received Discarded Packets per server. Itlookspretty bad, but I'm not sure. Is it a problem? Orshould I continue my investigation?

P2V of Windows 2008 R2 SQL clustered servers

$
0
0

Hi

We are planning to do P2V for existing Windows 2008 R2 / SQL 2008 clusters hosts (In test lab we did it successfully). We concerned about a few points:

a) In case of failure / application issues we will revert to the existing production servers, in that case if the trust failed for pohysical servers then we can re add those hosts again to domain, but in case of cluster name trust failure, how can we correct it?

b) is there any known issues / precauations to be takes before converting clustered physical hosts to VM?

Thanks in advance

  


LMS

VRTX External Network Connectivity Loss Causes Windows 2012 R2 Cluster Failure

$
0
0

We have a VRTX server with 2 server modules, the internal Gb switch module, and a single PERC8 card. Both server modules are running Windows 2012 R2 as a host OS (running off the modules HDs). They are configured as 2 nodes (1A and 1B) of a Failover Cluster (with cluster services typically residing on 1A). The Failover Cluster consists of multiple virtual OS nodes all using shared storage built into the VRTX (two virtual drives: a 7.2TB data drive and a 10GB quorum drive) as Cluster Disks.

The problem that I am noticing is that if the VRTX loses connectivity to the network outside the VRTX, then that seems to be triggering a cluster failure event, which is bringing the virtual nodes down in a dirty fashion. The sequence of events seems to be:

1. External Network Connection Goes Down

2. There is a Critical Event 1135 on node 1B: "Cluster node '1A' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster."

3. There is an Informational Event 1650 on node 1A: Cluster has lost the UDP connection from local endpoint [IP address of 1A]:~3343~ connected to remote endpoint [IP address of 1B]:~3343~.

4. Roles move from 1A to 1B and go from offline to online.

There are also some errors that occur with respect to the cluster disk. Eventually, everything is up and operating on 1B (with some exceptions having to do with them coming back online in an order that isn't well supported).

The thing that makes this all odd is that the server modules/nodes never lose connectivity to each other (because of the internal VRTX switch). They really only lose connectivity to the outside world and I don't think anything in the cluster is dependent on the outside world.

Does anyone have any idea of why the cluster is failing because of an external network connection? And how to prevent it in future?

Thanks,

indyvql


Installing SQL service packs on clustered SQL Server 2014

$
0
0

Hello every one.

we have a two node clustered SQL server 2014. now i want to install SQL service pack 1 and 2 on both of them. what should i do? 

does any one have some experiences?


SRP rules for Cluster console opening hyper-v tools

$
0
0
We have SRP rules, but it's causing Cluster Manager to not open any Hyper-V related settings like 'Settings...'.  It says that Hyper-V tool management isn't installed.  Obviously, the tools are there, but it just fails to detect it. Any idea what whitelist SRP rule we need?   Strangely, it doesn't generate a 865 event.

Issue in one VM Mailbox migrate to another node

$
0
0

Dear Partner,

We have Clustering ( Nodes 1,2,3 ) and we have VM (Exchange Mailbox) in node 1 we trying to migrate this VM from node 1 to 2 is working fine and we trying to migrate VM from 1 to 3 is not working Both ( live migration & quit migration )

We are checking another VM from node 1 to node 3 is working .

Now the problem in this vm machine migrate from any node to node 3 we getting error :

Error 1069 Microsoft-Windows-FailoverClustering :Cluster resource 'Virtual Machine CLUB-MBX2' of type 'Virtual Machine' in clustered role 'CLUB-MBX02' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Error 1205 The Cluster service failed to bring clustered role 'CLUB-MBX02' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Regards,

Magdy

Windows 2016 Hyper-V cluster

$
0
0

  Hi,

  my plan is to install a new Hyper-V  2016 cluster. My plan is to run Exchange 2016, SQL 2014/2016 (Navision C5) Exchange

  sever 2016, Domain controller 2016 and Terminal server 2016. I will be using two identical Tower servers HP Proliant ML350

  Generation 9, and to have two Physical drives i.e. C drive  300GB for host system and HyperV sso (fast disks) mirrored and E drive   1.5TB sso  Raid10  for VM's.

  What confuses me is if I create a two not Hyperv-v cluster do I need to create a DAG for the Exchange server or SQL cluster as 

  well? Is the Hyper-V cluster covering a Hight Availabilty for all the servers on it.?

  thanks

  


Erro

SMB Access denied for Cluster Role Resource

$
0
0

Dear All,

   I have Window 2008 R2 File Server Fail over cluster which is having in Production. As part of DR fail-over test i have created another stand alone Windows 2008 R2 Server with File Server role enabled. 

currently File Server disk (Disk) replication to DR with 3rd party product and during fail-over productioncluster role offline and attaching production disk to DR stand alone machine

Once disk attached to the DR host then changing the "Cluster Role - DNS "A" record IP Address pointing to DR Server .

when the users are trying to access the user Home folder or shared folder user getting access denied error. tried the \\DNS and FQDN (the access denied error. )

when i login to any workstation or Server with local administrator try to access same SMB using \\DNS and FQDN name it's working fine.

Any idea? 

Viewing all 5654 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>