Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

add delay in startup service

$
0
0

Hello,

We have a Windows 2012R2 failover cluster for our mailserver (no MS) environment.
We have a generic service role installed.

But now we have the problem that it starts the service too quick.
Al dependencies are loaded and it starts right after that, but we would like to add a delay for about 5 seconds before starting the service.

This is because the service has some problems finding resources when it's started directly (don't know why tough).
But when I reset the service right after it started, everything works just great.

So the question is: can I add a delay in the startup process of a generic service?


Current host server for a cluster

$
0
0

Hi folks-

In Failover Cluster Manager, when a node is the "Current Host Server", what exactly does that imply?  Normally this doesn't affect me, as I'm more concerned with which node owns a specific cluster resource.  However, for multi-subnet clusters, the cluster IP will change depending on which node is the Current Host Server. (i.e. the cluster IP will be in the same subnet as the Current Host Server)

In my mind, this doesn't really seem to be the "active" node, as other nodes might be owning/running cluster resources.  If it is the active node at a WSFC-level, does this change anything in terms of how that node behaves?

Thanks in advance!


Cheers! Brandon Tucker, Database Developer / DBA, OppenheimerFunds

When will my next CAU kick off?

$
0
0

Howdy,

We have Self Updating turned on for some of our clusters and I'm wondering if there's somewhere to look, or a powershell command to run, that will tell me the date and time that it will run next so we can do a quick audit now and then and make sure everything is going to kick off when we think it is.

We sometimes have to change the schedule due to holidays or other things and they sometimes don't get put back to where we want them.

I'm hoping for something I can just run or check real easily that will tell me when it will run next.

Thanks,

NIC Teaming and Converged Network for Hyper-v cluster

$
0
0

Hi,

We are building Hyper-v with Converged Networks. We will have HeartBeat, Live Migration and Management VLANs. VLAN IDs must be assigned to vNics, like (Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName "Management" -Access -VlanId 210).

But should I assign VLAN IDs also on physical nics Team (by adding New Interface on the nics Teaming and specifying Specific VLAN ID)?

Thanks

Intermittent Live Migration failure generating Event ID 21502, 22038, 21111, 21024

$
0
0

We have a multi node Hyper V Cluster that has recently developed an issue with intermittent failure of live migrations.

We noticed this when one of our CAU runs failed because it could not place the Hosts into maintenance mode or successfully drain all the roles from them.

Scenario:

Place any node into Maintenance mode/drain roles.

Most VM's will drain and live migrate across onto other nodes.  Randomly one or a few will refuse to move (it always varies in regards to the VM and which node it is moving to or from).  The live migration ends with a failure generating event ID's 21502, 22038, 21111, 21024.  If you run the process again (drain roles) it will migrate the VM's or if you manually live migrate them they will move just fine.  Manually live migrating a VM can result in the same intermittent error but rerunning the process will succeed after one or two times or just waiting for a couple minutes.

This occurs on all Nodes in the cluster and can occur with seemingly any VM in the private cloud.

Pertinent content of the event ID's is:

Event 21502
Live migration of 'VM' failed.

Virtual machine migration operation for 'VM' failed at migration source 'NodeName'. (Virtual machine ID xxx)

Failed to send data for a Virtual Machine migration: The process cannot access the file because it is being used by another process. (0x80070020).

Event 22038
Failed to send data for a Virtual Machine migration: The process cannot access the file because it is being used by another process. (0x80070020).

According to this it would appear that something is locking the files or they are not transferring permissions properly, however all access to the back end SOFS is uniform across all the Nodes and the failure is intermittent rather than consistently happening on one Node. 

Thanks in advance!

hyperv cluster moving servers

$
0
0

Hello.

I have a hyperV cluster of 4 servers hosting about 20 virtual servers. Throughout the week, the virtual guests are balanced between the 4 servers. These servers run Server 2008 R2. In recent weeks, we have been seeing something happening where when we come in Monday morning, the virtual servers have been migrated to the last HyperV in the array and some of the guests did not start back up (most likely due to resource allocation). I have applied all critical Microsoft updates as well as updated the agent from SCVMM (that is running server 2012 R2). This last weekend, at least the guests were split between the last 2 HyperV servers, and only 3 of the guests were not up and running.

In looking at the event logs, there are a number of iSCSI errors, that seem to happen all day not just Saturday morning which is when we get email alerts that some of the servers are down, and the errors (event ID 129, 39 and 9 in that order) say the details are in the dump data. How do I get the details of what is going on? and would these errors cause the machines to migrate to other servers and not migrate back?

Access denied when validating configuration for a failover cluster

$
0
0

Hi,

i've spent days now trying to install a cluster on two virtual Server 2012 R2 nodes running on ESX 6. No matter what i try it always comes back to the following error in the validation report:

An error occurred while executing the test.
An error occurred while getting information about the software updates installed on the nodes.

One or more errors occurred.

Creating an instance of the COM component with CLSID {4142DD5D-3472-4370-8641-DE7856431FB0} from the IClassFactory failed due to the following error: 80070005 Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED)).

I've checked all the things mentioned in https://social.technet.microsoft.com/Forums/windowsserver/en-US/39e6e957-95fd-4de5-89c2-0ea60e63b9d6/access-is-denied-messages-in-win2012-r2-failover-cluster-validation-report-and-csv-entering-a-paused?forum=winserverClustering  and several other things. No change.

My last finding related to this problem is, that everytime this access denied error happens, two entries are logged in the security event log of one of our domain controllers:

Note: the blacked service name shows my username.

AAccording to RFC4120 error 0x1b (27) means

 KDC_ERR_MUST_USE_USER2USER            27  Server principal valid for
                                               user2user only

I'm logged on with a domain admin with local admin rights on the cluster nodes and i have no idea what might be the reason for this problem. Can anybody shed some light on this, please?

Thanks,

Klaus


Hyper-V 2012 R2 Cluster, move vm when network failure

$
0
0

Hello all,

I have configured a Hyper-V 2012 R2 Failover Cluster with two nodes. Firstly I should say that manual live migration working fine and all vm can be migrated to the other node without any interruption. Restarting any of nodes moving VMs to the other. No problem.

During my failover tests, I found difficult problem.
When I create a new HA VM (doesn't matter on which node) and disconnect cables from the node this VM is moved to the second one. Good! :) Normal behavior using "Protected Network". When I restore network connection I can manually move this vm without any problem. The problem starts existing when I unplug cables and wait for failover the same machine next time. I get information: The operation did not complete on resource ... in detail information there is only "Cluster resource 'VM' in clustered role 'VMName' has received a critical state notification (EventID 1255). No auto failover.  It still can be moved manually by clicking live migration.

When I delete VM from cluster role and bring as HA it can be failover again but only once.

Any advice ?



CUA: One or more errors occurred while checking the status of Windows Firewall on the cluster nodes

$
0
0

Cluster with 2 hosts 2012 R2

Scheduled CAU fails with:

CAU run {4EFE116C-AB49-456D-8EED-F7EDC764DA49} on cluster Cluster1 failed. Error Message:One or more errors occurred while checking the status of Windows Firewall on the cluster nodes. Review the errors for more information on how to resolve the problems. Error Code:-2146233088 Stack:   at MS.Internal.ClusterAwareUpdating.Util.<CheckFirewallsAsync>d__3a.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.ClusterAwareUpdating.Commands.InvokeCauRunCommand.<_ProcessCluster>d__78.MoveNext()

If I run CAU "Analyze Readiness" ALL comes as PASS

If I run CUA by hand on same hosts with NO change to the system (not even reboot) it finishes OK

Anybody any ideas?

Thanks

Seb

Dependencies missing IP address on one node

$
0
0

I have installed a new cluster on a pair of Windows Server 2012 R2 servers to run a SQL Server availability group.  When I go to the properties of the cluster and click on the Dependencies tab, I see:

   Cluster IP Address
OR Cluster IP Address 192.168.42.219

Why is the first entry missing an IP address?  Does anyone know how to add the IP address back in?

Any help appreciated.

Ken

Why do the role resources not follow cluster core resources?

$
0
0

We have a 2-node Windows 2012 R2 cluster consisting of nodes SERVER1A and SERVER1B, which share a cluster core resource named SERVER1 which is currently owned by SERVER1A. There is also a role named "SQL Server (HA01)" under which are five resources: a DataKeeper disk volume, a file server share (\\sqlha01), an IP address, a SQL server service, and a SQL server agent. The role and resources are all currently running on SERVER1B. 

Why would the role and resources be running on SERVER1B when the cluster core resource is on SERVER1A? Forgive me, but I am accustomed to Linux corosync+pacemaker clustering, where everything follows the same primary server. 

Power issues & failover disks not loading

$
0
0

We have a windows 8 R2 2-node cluster running a few shared folders & print server resources on a MSA (DAS) storage and it has been working ok, and no changes have been made.  Server/storage have redundant power supply and is on a 42U rack with PDUs. Equipment connects to PDU, PDUs connect to (2) R5500 & 2 R3000 UPS with enough redundancy, and the bldg. is on a generator.  With all this power redundancy, for last few months we have been experiencing power surges which cause cluster fail over.   The cluster quorum node looses connection, quorum fails over to node 2, however, neither node is able to load the disk for the shared resource that fail over too (errors are, disk cannot be found, etc).

Properties for resources are as follows:

"If resource fails, restart on current node, restarts value is 15:00"

"max restarts is 1"

"If restart is unsuccessful, fail over all resources in the service or app" (is checked)

"If all the restart attempts fail, begin restarting again after the specified period, value is 01:00"

"Pending time value is 03:00"

I ran a cluster validation process, configuration was all good.  However, there was a warning that the "print spooler" resource "was not configured to the standard "pending timeout" value.  All resources as well as the "print server" itself have the default setting pending timeout value of "03:00", but the "print spooler" is set to "05:00".  Don't remember if we changed value at setup, or if this is default for a"print spooler".  Appreciate if someone can shed some light on this.  Should it be set to default of 03:00 like other resources?  Should anything in the cluster policies be changed?  any advice what else to look for?  bad ups even though they all light up green? do PDUs go bad? 

Appreciate your advice.  Thank you.

-CocoFlor


HR

Clustering on Windows Server 2016 TP4 - Storage Spaces Direct, Quorum, etc.

$
0
0

Hi Everyone,

We are trying to setup a cluster on Windows Server 2016 TP4 using 2 physical nodes, each with it's local storage and useStorage Spaces Direct to create a shared storage for the cluster. There is no network storage, only the nodes local storage.

Our plan is to use the storage pool made out of the local server's disks using Storage Spaces Direct, to host a high-available SMB share that will be used for VMs running on Flexiant Cloud Orchestrator (FCO) which can connect and host VMs on an SMB share (and in the future will be replaced with Hyper-V VMs).

I found few articles that describe how to configure it, but non is talking about the quorum requirement.

As there is no network storage with a LUN that will host the quorum, is there another option to create a quorum besides a File Share Witness?

I read here - http://www.aidanfinn.com/?p=15340 - about the second option (Create a Storage Spaces Virtual Disk As A Witness Disk) but didn't fully understand how to configure it. Also, where should I create the virtual disk, on one of the cluster nodes? On both? Will it be highly available?

Just to remind again, the storage is only local, which resides on the physical nodes of the cluster.

Thanks!

Error 13 from ResourceControl for resource Disk Drive while adding cluster disk

$
0
0

Hi,

I have a drive mounted at C:\mountpoint\Kdrive. C:\ is not a cluster disk. I am trying to use Cluster API to add this disk to the cluster but it fails with the following errors:-

00000928.00000ce8::2016/03/10-04:59:51.637 INFO  [RCM] rcm::RcmApi::CreateResource: (SQL Server (MSSQLSERVER), Disk Drive C:\mountpoint\KDrive\, 8836dfef-fa51-419d-960f-75965fed6cfd, Physical Disk)
00000928.00000ce8::2016/03/10-04:59:51.637 INFO  [RCM] rcm::RcmGum::CreateResource(Disk Drive C:\mountpoint\KDrive\,8836dfef-fa51-419d-960f-75965fed6cfd,SQL Server (MSSQLSERVER))
00000304.00000554::2016/03/10-04:59:51.678 ERR   [RES] Physical Disk <Disk Drive C:\mountpoint\KDrive\>: Open: Unable to get disk identifier. Error: 5023.
00000928.00000dc8::2016/03/10-04:59:51.678 INFO  [RCM] HandleMonitorReply: OPENRESOURCE for 'Disk Drive C:\mountpoint\KDrive\', gen(0) result 0.
00000304.00000554::2016/03/10-05:00:12.208 ERR   [RHS] Error 13 from ResourceControl for resource Disk Drive C:\mountpoint\KDrive\.
00000928.00000ce8::2016/03/10-05:00:12.208 WARN  [RCM] ResourceControl(SET_PRIVATE_PROPERTIES) to Disk Drive C:\mountpoint\KDrive\ returned 13

I tried with various syntax for the Disk Drive path (with single \ and double \\) but nothing works. If I execute the same code with path like K:\ it works fine.

Code snippet:

try
 {
  // Create the resource.  The resource name is "Disk Drive @:"
  // where @ is the drive letter of a disk partition.
  bstr_t bstr;
  UTIL_Utf8ToWideChar (szDiskPath.data(), bstr);
  int length = bstr.length ();
  lpstrDiskPathW = new WCHAR[length + 1];
  wcsncpy (lpstrDiskPathW, (const wchar_t*)bstr, length);
  lpstrDiskPathW[length] = L'\0';

  String strResName = "Disk Drive " + szDiskPath;
  UTIL_Utf8ToWideChar(strResName.data(), bstr);
  length = bstr.length ();
  lpstrResourceNameW = new WCHAR[length + 1];
  wcsncpy (lpstrResourceNameW, (const wchar_t*)bstr, length);
  lpstrResourceNameW[length] = L'\0';

  hResource = m_funcCreateClusterResource(hClusterGroup,
   (LPCWSTR)lpstrResourceNameW,
   L"Physical Disk",
   0);

  if( hResource == NULL )
  {
   m_log.error("CreateDiskResource: failed to create disk resource %s", strResName);
   throw -1;
  }
  else
  {
   m_log.info("CreateDiskResource: created disk resource %s", strResName);
  }

  // Set the diskpath private property
  // Begin property list used to set the DiskPath private property.
  WCHAR szPropName[] = CLUSREG_NAME_PHYSDISK_DISKPATH;

  typedef struct _DiskPathControl
  {
   DWORD dwPropCount;
   CLUSPROP_PROPERTY_NAME_DECLARE(PropName,sizeof(szPropName)/sizeof(WCHAR));
   CLUSPROP_SZ_DECLARE(DiskPathValue, sizeof(lpstrDiskPathW)/sizeof(WCHAR));
   CLUSPROP_SYNTAX Endmark;
  } DiskPathControl;

  DiskPathControl DPC;

  //  Property Count
  DPC.dwPropCount = 1;

  //  Property Name
  DPC.PropName.Syntax.dw  = CLUSPROP_SYNTAX_NAME;
  DPC.PropName.cbLength   = sizeof( szPropName );
  wcsncpy (DPC.PropName.sz, (const wchar_t*)szPropName, DPC.PropName.cbLength);

  //  Property Value
  DPC.DiskPathValue.Syntax.dw = CLUSPROP_SYNTAX_LIST_VALUE_SZ;
  DPC.DiskPathValue.cbLength  = sizeof( lpstrDiskPathW );
  wcsncpy (DPC.DiskPathValue.sz, (const wchar_t*)lpstrDiskPathW, DPC.DiskPathValue.cbLength);

  //  Endmark
  DPC.Endmark.dw = CLUSPROP_SYNTAX_ENDMARK;

  DWORD cbSize = sizeof( DiskPathControl );

  //  End property list creation

  // Set the diskpath private property
  dwRC = m_funcClusterResourceControl( hResource,
   NULL,
   CLUSCTL_RESOURCE_SET_PRIVATE_PROPERTIES,
   ( void* ) &DPC,
   cbSize,
   NULL,
   0,
   NULL );

  if( dwRC != ERROR_SUCCESS )
  {
   String err(dwRC);
   m_log.error("AA_ClusterBase:: CreateDiskResource: failed to set the DiskPath property, error %s", err);
   m_funcDeleteClusterResource( hResource );
   m_funcCloseClusterResource( hResource );
   hResource = NULL;
   throw -1;
  }
 }
 catch (...)
 {
 }

Is there a know limitation with the Cluster API for not supporting disks mounted on mountpoints?

BTW this works fine:

C:\>cluster res “Disk W:\Mount” /priv DiskPath=”W:\Mount”

Thanks,

Aditya

Guest VM simultaneous failover

$
0
0

Hi,

It is a requirement within our environment for certain guest VM's to always be located on the same node of a cluster as each other, so if one is migrated off, the other moves with it. Essentially they need to be "paired".

Can someone please advise on how I can do this?

Regards

Leon


windows 2012 r2 CSV report enter pause c0130021

$
0
0

Hi,

i use windows 2012 r2 cluster 2 nodes fully patched with ibm storage v3700 FC connect and for the backup i use veeam backup and replication.i get this error few times a day.i must say that the VM's does not enter pause mode but the CSV change host when i get the error. I've checked and update the bios,drivers, firmware with IBM and the servers and storage are fully patched.

how to fix this problem...?

THX 

Cluster Shared Volume 'Volume2' ('CSV2') has entered a paused state because of '(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Software snapshot creation on Cluster Shared Volume(s) ('\\?\Volume{74937dcb-1bd3-4af1-865e-94b24a509a86}\') with snapshot set id 'b428bb9b-a027-41cd-802a-552cc047ddc8' failed with error 'HrError(0x80042306)(2147754758)'. Please check the state of the CSV resources and the system events of the resource owner nodes.

Randomly restarting VM in Cluster

$
0
0

Hello,

In our 2012R2 failover cluster we have windows 2008R2 virtual machines that restart without a message. In our cluster we reseive the following error:

Cluster resource 'VM Name' of type 'Virtual Machine' in clustered role 'VM  Resources' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Source: Micorsoft-Windows-FailoverClustering , event id: 1069

When i look in de cluster logging i see the following messages:

INFO  [RCM [RES] VMName embedded failure notification, code=0 _isEmbeddedFailure=false _embeddedFailureAction=2

Anybody got a idea what the problem can be??

Thanks!!

SQL database instance on passive node goes to hung state when FSW become unavailable.

$
0
0

Hello Team,

I have configured fail-over clustering on two virtual nodes having Windows 2012 R2 Operating system. On windows cluster i have configured SQL server SQL Server Always-On Availability groups. SQL server has 3 instances which are distributed on both servers. The cluster owner is holding two instance & passive node is holding single instance. File witness share is configured on domain controller.

Recently we faced an issue with one of the the database instance which is hosted on passive node & it went to hung state. We checked cluster events & found that during issue FSW was unavailable that lead this issue but same time the we don't find issue with owner node who is holding two database instance. Same issue had been re-occurred when we rebooted Domain Controller for Patching activity.

We have distributed these database instances on both nodes because we want to utilize both nodes as having less memory on virtual machine i.e. 8 GB RAM each node.

If i keep all the database instances on the owner node & FSW is unavailable due to any reason all is working fine.

Herewith i am little bit confused to keep database instance on single node or distributed on both node due to RAM constraint.

What will be best practice with respect to Windows Fail-over clustering ? what settings to be done at cluster side to keep alive my database instance even my FSW is unavailable ?

Please suggest.

Regards,

Vinod


How do I find out the Cluster Name Account?

$
0
0

Hi!

I have a 2 node cluster on Windows Server 2012 and Im trying to find out the Cluster name account.

Any idea how to do this?

Thanks,

Zoe

Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

$
0
0

Hi,

New to 2012 and implementing a clustered environment for our File Services role.  Have got to a point where I have successfully configured the Shadow copy settings.

Have a large (15tb) disk.  S:

Have a VSS drive (volume shadow copy drive) V:

Have successfully configured through Windows Explorer the Shadow copy settings.

Created dependencies in Failcover Cluster Server console whereby S: depends on V:

However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab. 

When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node.  So the disk knows about the shadow copies that were ran on the original node but the "previous versions" tab has no entries to display.

This is in a 2012 server (NOT R2 version).

Can anyone explain what might be the reason?  Do I have an "issue" or is this by design?

All help apprecieated!

Kathy


Kathleen Hayhurst Senior IT Support Analyst



Viewing all 5654 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>