Disk Accessing Errors

January 16, 2019, 12:27 am

≫ Next: Generic service resource using cluster name

≪ Previous: Server 2016 two node cluster network config on virtual hw

Today in morning we faced issues regarding cluster disk failures.

Below is the error we faced.

The validation report is run and found some warning can some expert help me to understand.

https://crescentpk-my.sharepoint.com/:u:/g/personal/osama_mansoor_crescent_com_pk/EUN8kP29fxBCs722GXxuLyYBlSX1LoufmDkIxOqBKN9c4Q?e=szzB9w

↧

Generic service resource using cluster name

January 2, 2019, 2:58 am

≫ Next: S2D - to iWARP or to RoCE / Switchless?

≪ Previous: Disk Accessing Errors

Here's our issue
We have a windows 2012 MSCS Failover cluster consisting of 2 nodes. The cluster has several windows generic service resources setup all of which are dependent on the cluster name/IP. In each of the cluster services we set it to not use the cluster name (clustered service properties -> Use network name for computer name). For some reason, one of the services is still registering its connection (this is a proprietary connection record in our application) in the database as the cluster name. I am looking for help/ideas why this could be happening. It is causing clients to connect using the record in our database (proprietary connection record) and due to application limitations there is some data corruption. We cannot recompile the application to fix it with code.

In our test environment we can make the issue happen and mitigate it by simply checking / unchecking the box to use the cluster name (the setting noted above). For some reason, in production that check box doesn't seem to make difference.

↧

S2D - to iWARP or to RoCE / Switchless?

January 16, 2019, 5:52 am

≫ Next: The cluster resource has become degraded Pls Help.....

≪ Previous: Generic service resource using cluster name

Hello all,

I am about to build a new platform and now I have to answer a number of crucial issues to make the right choice for hardware.

3-Node S2D Setup based on Windows 2019 Datacenter:

3 x HPE DL360 Gen10 NVME with 5 x 6.4 TB NVME SSD with an effective 25.6 TB Storage Pool

In every server I have the option to do 2 x SFP28 10 / 25Gbit adapter with 2 ports of the brands:

• Mellanox ConnectX-4 Lx (640FLR-SFP28) prefered

• Broadcom BCM57414 (631FLR-SFP28)

• Marvell QL41401L-A2G (622FLR-SFP28)

So I am figuring out what fits best to get the most optimal configuration.

For example, RoCE and iWARP are such a choice where one says that iWARP is faster:

https://www.chelsio.com/wp-content/uploads/resources/iwarp-s2d-updates.pdf

And where the other person says that RoCE v2 is faster:

http://www.mellanox.com/related-docs/whitepapers/WP_RoCE_vs_iWARP.pdf

My questions:

It is advisable to have a witness (FSW) available with a S2D cluster with 3 nodes. This is possible because of a 4th server that is not part of the S2D cluster. So a FileShare witness can be realized on this. Can a 3-node setup with witness (FSW) be realized without a switch? Should the witness FS be part of the iWARP or RoCE subnet? Or is it advised to always use a switch?

Which technique should I use? iWARP or RoCE? What is the fastest?

↧

The cluster resource has become degraded Pls Help.....

December 4, 2008, 9:16 pm

≫ Next: Fault Domains with s2d

≪ Previous: S2D - to iWARP or to RoCE / Switchless?

Hi,

I am getting follwing Warrning message continiously for all cluster instances. The warning message is same but resources name will be different. I have exchnage server installed in two node active / passive cluster.

Event Type: Warning
Event Source: Foundation Agents
Event Category: Events
Event ID: 1167
Date:  12/1/2008
Time:  2:39:48 PM
User:  N/A
Computer: xyz03
Description:
Cluster Agent: The cluster resource SMTP Virtual Server Instance 1 (xyz05) has become degraded.
[SNMP TRAP: 15005 in CPQCLUS.MIB]
Data:
0000: 0a020065 0000000d 50544d53 72695620
0010: 6c617574 72655320 20726576 74736e49
0020: 65636e61 28203120 58454346 00293530
0030: 00000000 00000000 00000000 00000000
0040: 00000000 00000000 00000000 00000000
0050: 00000000 00000000 00000000 00000000
0060: 00000000 00000000 00000000 00000000
0070: 00000000 00000000 00000000 00000000
0080: 00000000 00000000 00000000 00000000
0090: 00000000 00000000 00000000 00000000
00a0: 00000000 00000000 00000000 00000000
00b0: 00000000 00000000 00000000 00000000
00c0: 00000000 00000000 00000000 00000000
00d0: 00000000 00000000 00000000 00000000
00e0: 00000000 00000000 00000000 00000000
00f0: 00000000 00000000 00000000 00000000
0100: 00000000 4d000000 6f726369 74666f73
0110: 63784520 676e6168 4d532065 53205054
0120: 65767265 6e492072 6e617473 00006563
0130: 00000000 00000000 00000000 00000000
0140: 00000000 00000000 00000000 00000000
0150: 00000000 00000000 00000000 00000000
0160: 00000000 00000000 00000000 00000000
0170: 00000000 00000000 00000000 00000000
0180: 00000000 00000000 00000000 00000000
0190: 00000000 00000000 00000000 00000000
01a0: 00000000 00000000 00000000 00000000
01b0: 00000000 00000000 00000000 00000000
01c0: 00000000 00000000 00000000 00000000
01d0: 00000000 00000000 00000000 00000000
01e0: 00000000 00000000 00000000 00000000
01f0: 00000000 00000000 00000000 00000000
0200: 00000000 00060000 43460000 33305845
0210: 00000000 00000000 00000000 00000000
0220: 00000000 00000000 00000000 00000000
0230: 00000000 00000000 00000000 00000000
0240: 00000000 00000000 00000000 00000000
0250: 00000000 00000000 00000000 00000000
0260: 00000000 00000000 00000000 00000000
0270: 00000000 00000000 00000000 00000000
0280: 00000000 00000000 00000000 00000000
0290: 00000000 00000000 00000000 00000000
02a0: 00000000 00000000 00000000 00000000
02b0: 00000000 00000000 00000000 00000000
02c0: 00000000 00000000 00000000 00000000
02d0: 00000000 00000000 00000000 00000000
02e0: 00000000 00000000 00000000 00000000
02f0: 00000000 00000000 00000000 00000000
0300: 00000000 00000000 00000000 00000000
0310: 00000000 00000000 00000000 00000000
0320: 00000000 00000000 00000000 00000000
0330: 00000000 00000000 00000000 00000000
0340: 00000000 00000000 00000000 00000000
0350: 00000000 00000000 00000000 00000000
0360: 00000000 00000000 00000000 00000000
0370: 00000000 00000000 00000000 00000000
0380: 00000000 00000000 00000000 00000000
0390: 00000000 00000000 00000000 00000000
03a0: 00000000 00000000 00000000 00000000
03b0: 00000000 00000000 00000000 00000000
03c0: 00000000 00000000 00000000 00000000
03d0: 00000000 00000000 00000000 00000000
03e0: 00000000 00000000 00000000 00000000
03f0: 00000000 00000000 00000000 00000000
0400: 00000000 00000000 00000000 00000000
0410: 00000000 00000000 00000000 00000000
0420: 00000000 00000000 00000000 00000000
0430: 00000000 00000000 00000000 00000000
0440: 00000000 00000000 00000000 00000000
0450: 00000000 00000000 00000000 00000000
0460: 00000000 00000000 00000000 00000000
0470: 00000000 00000000 00000000 00000000
0480: 00000000 00000000 00000000 00000000
0490: 00000000 00000000 00000000 00000000
04a0: 00000000 00000000 00000000 00000000
04b0: 00000000 00000000 00000000 00000000
04c0: 00000000 00000000 00000000 00000000
04d0: 00000000 00000000 00000000 00000000
04e0: 00000000 00000000 00000000 00000000
04f0: 00000000 00000000 00000000 00000000
0500: 00000000 00000000 00000000 00000000
0510: 00000000 00000000 00000000 00000000
0520: 00000000 00000000 00000000 00000000
0530: 00000000 00000000 00000000 00000000
0540: 00000000 00000000 00000000 00000000
0550: 00000000 00000000 00000000 00000000
0560: 00000000 00000000 00000000 00000000
0570: 00000000 00000000 00000000 00000000
0580: 00000000 00000000 00000000 00000000
0590: 00000000 00000000 00000000 00000000
05a0: 00000000 00000000 00000000 00000000
05b0: 00000000 00000000 00000000 00000000
05c0: 00000000 00000000 00000000 00000000
05d0: 00000000 00000000 00000000 00000000
05e0: 00000000 00000000 00000000 00000000
05f0: 00000000 00000000 00000000 00000000
0600: 00000000 00000000 00000000 00000000
0610: 00000000 00000000 00000000 00000000
0620: 00000000 00000000 00000000 00000000
0630: 00000000 00000000 00000000 00000000
0640: 00000000 00000000 00000000 00000000
0650: 00000000 00000000 00000000 00000000
0660: 00000000 00000000 00000000 00000000
0670: 00000000 00000000 00000000 00000000
0680: 00000000 00000000 00000000 00000000
0690: 00000000 00000000 00000000 00000000
06a0: 00000000 00000000 00000000 00000000
06b0: 00000000 00000000 00000000 00000000
06c0: 00000000 00000000 00000000 00000000
06d0: 00000000 00000000 00000000 00000000
06e0: 00000000 00000000 00000000 00000000
06f0: 00000000 00000000 00000000 00000000
0700: 00000000 00000300 00000000 00000000
0710: 00000000 00000000 00000000 00000000
0720: 00000000 00000000 00000000 00000000
0730: 00000000 00000000 00000000 00000000
0740: 00000000 00000000 00000000 00000000
0750: 00000000 00000000 00000000 00000000
0760: 00000000 00000000 00000000 00000000
0770: 00000000 00000000 00000000 00000000
0780: 00000000 00000000 00000000 00000000
0790: 00000000 00000000 00000000 00000000
07a0: 00000000 00000000 00000000 00000000
07b0: 00000000 00000000 00000000 00000000
07c0: 00000000 00000000 00000000 00000000
07d0: 00000000 00000000 00000000 00000000
07e0: 00000000 00000000 00000000 00000000
07f0: 00000000 00000000 00000000 00000000
0800: 00000000 00000000 00000000 00000000
0810: 00000000 00000000 00000000 00000000
0820: 00000000 00000000 00000000 00000000
0830: 00000000 00000000 00000000 00000000
0840: 00000000 00000000 00000000 00000000
0850: 00000000 00000000 00000000 00000000
0860: 00000000 00000000 00000000 00000000
0870: 00000000 00000000 00000000 00000000
0880: 00000000 00000000 00000000 00000000
0890: 00000000 00000000 00000000 00000000
08a0: 00000000 00000000 00000000 00000000
08b0: 00000000 00000000 00000000 00000000
08c0: 00000000 00000000 00000000 00000000
08d0: 00000000 00000000 00000000 00000000
08e0: 00000000 00000000 00000000 00000000
08f0: 00000000 00000000 00000000 00000000
0900: 00000000 4e000000 61427465 70756b63
0910: 72655320 00726576 00000000 00000000
0920: 00000000 00000000 00000000 00000000
0930: 00000000 00000000 00000000 00000000
0940: 00000000 00000000 00000000 00000000
0950: 00000000 00000000 00000000 00000000
0960: 00000000 00000000 00000000 00000000
0970: 00000000 00000000 00000000 00000000
0980: 00000000 00000000 00000000 00000000
0990: 00000000 00000000 00000000 00000000
09a0: 00000000 00000000 00000000 00000000
09b0: 00000000 00000000 00000000 00000000
09c0: 00000000 00000000 00000000 00000000
09d0: 00000000 00000000 00000000 00000000
09e0: 00000000 00000000 00000000 00000000
09f0: 00000000 00000000 00000000 00000000
0a00: 00000000 00000000 00000003 00000000
0a10: 0000000d 00000000 00000000 00000000
0a20: 8435048f 00000007 ffff0004 ed220000
0a30: 0a020060 07010065 0000ffff 00000003
0a40: 00000000 00000004 00000000 00000000
0a50: 00000000 00000000 00000000 ffff80ff
0a60: ffffffff ffffffff ffffffff ffffffff
0a70: ffffffff 00010001 00000000 00000000
0a80: 00000004 00000000 00000000 00000000
0a90: 00000000 00000000 001b0000 005d5f08

Few other errors are

Event Type: Error
Event Source: ClusSvc
Event Category: Failover Mgr
Event ID: 1069
Date:  12/1/2008
Time:  6:27:33 PM
User:  N/A
Computer: XYZ02
Description:
Cluster resource 'resource name' in Resource Group 'ResourceClusterGroup-XYZ02' failed.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Event Type: Error
Event Source: ClusSvc
Event Category: Failover Mgr
Event ID: 1069
Date:  12/1/2008
Time:  6:24:53 PM
User:  N/A
Computer: xyz02
Description:
Cluster resource 'Resource name' in Resource Group 'Resource-ClusterGroup-xyz02' failed.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Any help is appreciated.....

Swapnil

↧

Fault Domains with s2d

January 17, 2019, 3:58 am

≫ Next: From Windows 2012R2 to 2016 Cluster operating system rolling upgrade question

≪ Previous: The cluster resource has become degraded Pls Help.....

Hello all,

I am currently into learning the way of virtualization with Windows and s2d.

it’s clear that I can define different kind of fault domains such as node, chassis, rack and site.

now I lack the money to proper test this configurations :)

as far as I understood the concept of fault domains on fault domain can go offline and the cluster self will remain online.

let’s say I have defined 2 rack fault domains ("rack-a" and "rack-b") with each 4 nodes(x4 100gb disk) ("hv01-08)

hv01-04 are assigned to rack-a the others to rack-b. then I create a s2d pool, as suggested from ms after I configured

the fault domains.

so for available storage, will I have 600gb or 1,9tb if I configure them with dual parity?

if I have vms on rack-a, after a failure of that rack, will the vms live-migrate over to rack-b?

does the rack self also have a fault tolerance of nodes? or will it mean if just a single node fails, the whole rack fails?

last, is it possible to add a single node to a rack only? how would it affect the storage?

hopefully somebody can answer my questions

thanks in advance

Regards

Elmar

↧

From Windows 2012R2 to 2016 Cluster operating system rolling upgrade question

January 9, 2019, 12:26 am

≫ Next: Storage Spaces Direct (S2D) - Poor write performance with 5 nodes with 24 Intel P3520 NVME SSDs each over 40Gb IB network

≪ Previous: Fault Domains with s2d

Hi everyone,

I want to upgrade a Windows 2012 R2 guest cluster (File server) to Windows 2016. This cluster is running on Windows 2016 hosts. The VMs that compose this cluster are using shared drives (VHDS).

The link to perform the upgrade https://docs.microsoft.com/en-us/windows-server/failover-clustering/cluster-operating-system-rolling-upgrade says:

"The following scenario is not supported in Windows Server 2016:
Cluster OS Rolling Upgrade of guest clusters using virtual hard disk (.vhdx file) as shared storage"

I don't think this applies to me. I think it refers to the way that Windows 2012R2 shared drives, which is different tan the way Windows 2012R2 does (Enable virtual hard disk sharing. vhdx)

Can I go ahead with the upgrade?

Thanks to everyone,

Ivan Mckenzie

↧

Storage Spaces Direct (S2D) - Poor write performance with 5 nodes with 24 Intel P3520 NVME SSDs each over 40Gb IB network

July 12, 2018, 7:50 am

≫ Next: S2D IO TIMEOUT when rebooting node

≪ Previous: From Windows 2012R2 to 2016 Cluster operating system rolling upgrade question

Need a little help with my S2D cluster which is not performing as I had expected.

Details:

5 x Supermicro SSG-2028R-NR48N servers with 2 x Xeon E5-2643v4 CPUs and 96GB RAM

Each node has 24 x Intel P3520 1.2TB NVME SSDs

The servers are connected over an Infiniband 40Gb network, RDMA is enabled and working.

All 120 SSDs are added to S2D storage pool as data disks (no cache disks). There are two 30TB CSVs configured with hybrid tiering (3TB 3-way mirror, 27TB Parity)

I know these are read intensive SSDs and that parity write performance is generally pretty bad but I was expecting slightly better numbers then I'm getting:

Tested using CrystalDiskMark and diskspd.exe

Multithreaded Read speeds: < 4GBps (seq) / 150k IOPs (4k rand)

Singlethreaded Read speeds: < 600MBps (seq)

Multithreaded Write speeds: < 400MBps (seq)

Singlethreaded Write speeds: < 200MBps (seq) / 5k IOPS (4k rand)

I did manage to up these numbers by configuring a 4GB CSV cache on the CSVs and forcing write through on the CSVs:

Max Reads: 23GBps/500K IOPs 4K IOPS, Max Writes:2GBps/150K 4KIOPS

That high read performance is due to the CSV cache which uses memory. Write performance is still pretty bad though. In fact it's only slight better than the performance I would get for a single one of these NVME drives. I was expecting much better performance from 120 of them!

I suspect that the issue here is that Storage Spaces is not recognising that these disks have PLP protection which you can see here:

Get-storagepool "*S2D*" | Get-physicaldisk |Get-StorageAdvancedProperty

FriendlyName          SerialNumber       IsPowerProtected IsDeviceCacheEnabled
------------          ------------       ---------------- --------------------                   
NVMe INTEL SSDPE2MX01 CVPF7165003Y1P2NGN            False                     
WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.
NVMe INTEL SSDPE2MX01 CVPF717000JR1P2NGN            False                     
WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.
NVMe INTEL SSDPE2MX01 CVPF7254009B1P2NGN            False                     
WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.

Any help with this issue would be appreciated.

Thanks.

↧

S2D IO TIMEOUT when rebooting node

April 18, 2018, 12:52 am

≫ Next: windows 2019 s2d cluster failed to start event id 1809

≪ Previous: Storage Spaces Direct (S2D) - Poor write performance with 5 nodes with 24 Intel P3520 NVME SSDs each over 40Gb IB network

I am building a 6 Node cluster, 12 6TB drives, 2 4TB Intel p4600 PCIe NVME drives - Xeon Plat 8168/768GB Ram, LSI9008 HBA.

The cluster passes all tests, switches are properly configured and the cluster works well, exceeding 1.1 million IOPS with VMFleet. However, at current patch as of now (April 18 2018) I am experiencing the following scenario:

When no storage job is running, all vdisks listed as healthy and I pause a node and drain it, all is well, until the server actually is rebooted or taken offline. At that point a repair job is initiated, and IO suffers badly, and can even stop all together, causing vdisks to go in to paused state due to IO timeout. (listed as the reason in cluster events) Exacerbating this issue, when the paused node reboots and joins, it will cause the repair job to suspend, stop, then restart (it seems.. tracking this is hard was all storage commands become unresponsive while the node is joining) At this point io is guaranteed to stop on all vdisks at some point for long enough to cause problems, including causing VM reboots. The cluster was initially formed using VMM 2016. I have tried manually creating the vdisks, using single resiliency (3 way mirror), multi tier resiliency, same effect. This behavior was not observed when I did my POC testing last year. Its frankly a deal breaker and unusable, as if I cannot reboot a single node without stopping entirely my workload, I cannot deploy. I'm hoping someone has some info. I'm going to re-install with Server 2016 RTM media and keep it unpatched, and see if the problem remains. However it would be desirable to at least start the cluster at full patch. Any help appreciated. Thanks

↧

windows 2019 s2d cluster failed to start event id 1809

October 4, 2018, 2:26 am

≫ Next: The computer is joined to a cluster in Windows Server 2012 and R2

≪ Previous: S2D IO TIMEOUT when rebooting node

Hi I have lab with insider windows 2019 cluster which I inplace upgraded to rtm version of 2019 server and cluster is shutdown after while and event id 1809 is listed

This node has been joined to a cluster that has Storage Spaces Direct enabled, which is not validated on the current build. The node will be quarantined.

Microsoft recommends deploying SDDC on WSSD [https://www.microsoft.com/en-us/cloud-platform/software-defined-datacenter] certified hardware offerings for production environments. The WSSD offerings will be pre-validated on Windows Server 2019 in the coming months. In the meantime, we are making the SDDC bits available early to Windows Server 2019 Insiders to allow for testing and evaluation in preparation for WSSD certified hardware becoming available.

Customers interested in upgrading existing WSSD environments to Windows Server 2019 should contact Microsoft for recommendations on how to proceed. Please call Microsoft support [https://support.microsoft.com/en-us/help/4051701/global-customer-service-phone-numbers].

Its kind weird because my s2d cluster is running in VMs is there some registry switch to disable this stupid lock ???

↧

The computer is joined to a cluster in Windows Server 2012 and R2

January 20, 2019, 8:23 am

≫ Next: Cluster resource could not be brought online in windows 2012 R2

≪ Previous: windows 2019 s2d cluster failed to start event id 1809

Dear Forum,

I deploy Window Failover Cluster in window server 2016 with 3 node(Node1, Node2, Node3). once i removed one node (Node3) from window fail over cluster. after next week later i add node3 to window failover cluster, when i add node3 can't join to cluster, while i add (Node3)to exiting the cluster it's show the error message in event viewer below.

The Cluster service cannot be started. An attempt to read configuration data from the Windows registry failed with error '2'. Please use the Failover Cluster Management snap-in to ensure that this machine is a member of a cluster. If you intend to add this machine to an existing cluster use the Add Node Wizard. Alternatively, if this machine has been configured as a member of a cluster, it will be necessary to restore the missing configuration data that is necessary for the Cluster Service to identify that it is a member of a cluster. Perform a System State Restore of this machine in order to restore the configuration data.

Ir seems that the record hasn't been deleted in the registry of this computer after dis-joining the cluster.

could anyone help on this problem?

↧

Cluster resource could not be brought online in windows 2012 R2

January 20, 2019, 9:28 am

≫ Next: Node in cluster - status changes to "paused"

≪ Previous: The computer is joined to a cluster in Windows Server 2012 and R2

unable to bring cluster name online in windows 2012 R2 file failover cluster

Getting error code: 0x8007139a

cluster events showing event id: 1214, 1205, 1069, 1254

My cluster is working OK but unable to switch over to other node and cluster ip is showing online but cluster name is not getting online.

Pl provide fixes.

Narender Yadav

↧

Node in cluster - status changes to "paused"

April 13, 2016, 6:44 am

≫ Next: Unable to connect to Failover cluster manager in Windows server 2016

≪ Previous: Cluster resource could not be brought online in windows 2012 R2

We have seven Windows 2012 R2 nodes in a Hyper-V cluster. They are all identical hardware (HP BladeSystem). For a while, we had only six nodes, and there were no problems.

Recently, we added the seventh node, and the status keeps reverting to "paused". I can't find any errors that directly point to why this is happening - either in the System or Application log of the server, in the various FailoverClustering logs, or in the Cluster Event logs. I created a cluster.log using the get-clusterlog command, but if it explains why this is happening, I can't figure it out (it's also a very large file - 150 MB, so it's difficult to determine what lines are the important ones).

As far as I can tell, everything on this new node is the same as the previous ones - the software versions, network settings, etc. The Cluster Validation report also doesn't give me anything helpful.

Any ideas on how to go about investigating this? Even before I can solve the problem, I'd like to at least know when and why the status reverts to paused.

Thanks,

David

↧

Unable to connect to Failover cluster manager in Windows server 2016

January 22, 2019, 8:29 pm

≫ Next: S2D 2 node cluster

≪ Previous: Node in cluster - status changes to "paused"

Hi,

I have a 2 node cluster in my environment which I used to be able to manage from Failover cluster manager. However, I am getting an error"The operation has failed. The following is a list of nodes that encountered this problem when the connection to the cluster was attempted:The remote node". Both nodes are in the same network segment and no local firewall is blocking.

I have seen the URL from below.

1) https://blogs.msdn.microsoft.com/clustering/2010/11/23/trouble-connecting-to-cluster-nodes-check-wmi/
2) https://blogs.technet.microsoft.com/askcore/2013/12/17/unable-to-launch-cluster-failover-manager-on-any-node-of-a-20122012r2-cluster/

I am able to the output when I ran wbemtest and "Get-WmiObject -namespace "root\mscluster" -class MSCluster_Resource" locally. When I ran the verification script from the 2nd URL, it shows "WMI query succeeded" for itself but "WMI query failed //The RPC server is unavailable. (Exception from HResult: 0x800706BA). for the remote node.

I then proceed to run the remediation steps below but it is still not working. Both servers seems be to working on its own but cannot connect to each other. There are a lot of event 4683 in Failoverclustering-Manager event viewer with the message "The error was 'An attempt to connect to the cluster failed due to one or more nodes not responding to WMI calls. This is usually caused by a problem with the WMI infrastructure on the node(s)'. Any suggestion?

MOF Parser
cd c:\windows\system32\wbem
mofcomp.exe cluswmi.mof

Reset WMI Repository
Winmgmt /resetrepository

Regards,
Chiew Sheng

↧

S2D 2 node cluster

January 23, 2019, 2:31 am

≫ Next: ++ 2 2012R2 Node Hyper-V Cluster Connection Issue ++

≪ Previous: Unable to connect to Failover cluster manager in Windows server 2016

Hello,

We have 2 node S2D cluster with windows server 2019. Between two nodes we have directly connected RDMA storage network (Cluster only) and client-facing network based on LACP teaming on each node (Client And Cluster). We have done failover test and it works: when we power off one node, virtual machines migrates to another host as expected. But when we unplug client facing adapters (two adapters in LACP) on one node, where VM are resides, VM migration fails and after some time Cluster network name and Cluster IP address also failed. When we plug again client facing adapters (two adapters in LACP) to failed node, cluster IP address recover and VM client network works again. So the problem: cluster migration fails after unexpectedly shutdown of client facing network of one node, where VM are resides. Nodes can communicate with each other through Storage network and all nodes are up in Failover Cluster manager. So when client network is down, VM should migrate to another node with working client-facing network. But cluster fails and VM do not migrate. Where we can fix this behaviour? Has anyone met this before?

↧

++ 2 2012R2 Node Hyper-V Cluster Connection Issue ++

January 24, 2019, 6:25 am

≫ Next: Setup a Cluster with CSV storage as a target for a DFS-R

≪ Previous: S2D 2 node cluster

Hello,

it seems that we have som ekind of connection issue within out hyper-v cluster. I am usinge the following script found in: C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2

Content of the script:

##############################################################################
#
# GetVM.ps1
#
# This script will fetch a bunch of info for a specific VM on the specified
# machine if a name is given, or all VMs if no VM name is given. We'll use
# this in the case of populating our VM nodes under a server node as well
# as any time we just need state info on a specific VM.
#
# Usage:
# GetVM.ps1 [ComputerName] [-isCluster] [-unclusteredOnly] [VmName]
#
# ComputerName - name of the host or cluster machine to get VM info from.
#
# isCluster - specifies that the input computer name is a cluster and
# the desire is to get a list of VMs that resides inside that cluster
#
# unclusteredOnly - only list VMs on inputted host that are not part of the local cluster
# This option implies you are querying for VMs on the inputted
# host (ComputerName) and is *not* compatible with the isCluster option
#
# VmName - name of a specific VM. This is optional - if not specified,
# info for all VMs is returned.
#
##############################################################################
param
(
[string]$machine,
[switch]$isCluster,
[switch]$unclusteredOnly,
[string]$vmname,
[switch]$ignoreRPCError,
[String[]]$InclusiveVMList,
[String[]]$ExclusiveVMList

)

$ScriptDir = Split-Path -parent $MyInvocation.MyCommand.Path
Import-Module $ScriptDir\UtilModuleV2
Import-Module $scriptDir\..\UtilModule

$error.clear()

# -----------------------------------------------------------------------------
# PrintVm
# - prints out VM's information
# -----------------------------------------------------------------------------
Function PrintVm
{
param
(
$myVM
)

write-host ("<vm>")
write-host ("<Name>"+$myVM.VMId.ToString().ToUpper()+"</Name>")
write-host ("<Element Name>"+$myVM.Name+"</Element Name>")
if ($vm.State -eq "Off")
{
write-host ("<Enabled State>3</Enabled State>")
}
else
{
write-host ("<Enabled State>2</Enabled State>")
}
write-host ("<Host Name>"+$myVM.ComputerName+"</Host Name>")
write-host ("</vm>")

}

Function GetSingleVm
{
if ($isCluster -eq $true)
{
foreach ($node in $nodes)
{
$vm = Get-VM -ComputerName $node $vmname
if ($vm -ne $null)
{
$vm
break
}
}
}
else
{
Get-VM -ComputerName $machine $vmname
}
}

Function GetVmList
{
$vmlist = New-Object System.Collections.ArrayList

$iExecuteQuery = 1

if ([string]::IsNullOrEmpty($InclusiveVMList) -and [string]::IsNullOrEmpty($ExclusiveVMList))
{
$iExecuteQuery = 0
if ($isCluster -eq $true)
{
foreach ($node in $nodes)
{
$vmList.AddRange(@(Get-VM -ComputerName $node | where {$_.IsClustered -eq $true}))
}
}
else
{
$vmList = @(Get-VM -ComputerName $machine)
}
}

if ($iExecuteQuery -eq 1)
{
if (-not [string]::IsNullOrEmpty($InclusiveVMList))
{
$iFlag = 1
$inclusiveQuery = ""
foreach ($vmPattern in $InclusiveVMList)
{
if ($iFlag)
{
$iFlag = 0
}
else
{
$inclusiveQuery = $inclusiveQuery + " -or "
}
$inclusiveQuery = $inclusiveQuery + '$_.Name' + " -clike `"" + $vmPattern + "`""
}
}

if (-not [string]::IsNullOrEmpty($ExclusiveVMList))
{
$iFlag = 1
$exclusiveQuery = ""
foreach ($vmPattern in $ExclusiveVMList)
{
if ($iFlag)
{
$iFlag = 0
}
else
{
$exclusiveQuery = $exclusiveQuery + " -and "
}
$exclusiveQuery = $exclusiveQuery + '$_.Name' + " -cnotlike `"" + $vmPattern + "`""
}
}

if (-not [string]::IsNullOrEmpty($InclusiveVMList))
{
if ([string]::IsNullOrEmpty($ExclusiveVMList))
{
$query = $inclusiveQuery
}
else
{
$query = $inclusiveQuery + " -and " + $exclusiveQuery
}
}
else
{
$query = $exclusiveQuery
}

if ($isCluster -eq $true)
{
foreach ($node in $nodes)
{
$command = "Get-VM -ComputerName $node"
$cluster = "{"+'$_.IsClustered'+" -eq 'true'"+"}"
$vmListCommandOnCluster = $command +" | where "+ $cluster
$vmListCommandOnQuery = $vmListCommandOnCluster+" | where "+ "{" +$query +"}"
write-host ("<vmListCommandOnQuery>"+$vmListCommandOnQuery+"</vmListCommandOnQuery>")
$vmListCommandOutput = $null
$vmListCommandOutput = Invoke-Expression $vmListCommandOnQuery
if (-not [string]::IsNullOrEmpty($vmListCommandOutput))
{
$count = $vmListCommandOutput.Count
if ($count -ne 0)
{
if ($count -eq "1")
{
$vmList.Add($vmListCommandOutput)
}
else
{
$vmList.AddRange($vmListCommandOutput)
}
}
}
}
}
else
{
$command = "Get-VM -ComputerName localhost"
$vmListQuery = "$command | where { $query" + "}"
$vmList = Invoke-Expression $vmListQuery
}
}

foreach ($vm in $vmlist)
{
if (($unclusteredOnly -eq $true) -and ($vm.IsClustered -eq $true))
{
continue
}

[void]$vmlist_out.Add($vm)
}
}

# -----------------------------------------------------------------------------
# START
# -----------------------------------------------------------------------------

# verify machine parameter has been specified
if ([string]::IsNullOrEmpty($machine))
{
LogError "Missing argument. Host or cluster machine name required."
exit
}

# if isCluster, . won't work since it's always the host's name
# Not the cluster name even if we're on the cluster manager
if (($isCluster -eq $true) -and (($machine -eq ".") -or ($machine -eq "localhost")))
{
LogError "Invalid arguments. Cluster queries cannot use `".`" for machine name."
exit
}

# FIXME: if isCluster, how do we make sure the input is in fact a cluster name?

if ($isCluster -eq $true)
{
# specifying a cluster name, but requesting
# VMs *not* in a cluster makes no sense
if ($unclusteredOnly -eq $true)
{
LogError "Invalid arguments. Cannot specify unclusteredOnly for a Cluster query."
exit
}
}
else
{
if ($unclusteredOnly -eq $true)
{
$cluster = $(Get-Cluster).Name
if ([string]::IsNullOrEmpty($cluster))
{
$unclusteredOnly = $false
$machine = gc env:computername
}
}
if ((($machine -eq ".") -or ($machine -eq "localhost")))
{
$machine = gc env:computername
}
}

$error.clear()

# get list of cluster nodes
$nodes = Get-ClusterNode -Cluster $machine

# initialize array of vms to be outputted
$vmlist_out = New-Object System.Collections.ArrayList
$vmlist_out.clear()

# get the vm(s)
if (-not [string]::IsNullOrEmpty($vmname))
{
$vmlist_out = GetSingleVm

# Log an error if a specific VM was requested, but not found
if ($vmlist_out.count -eq 0)
{
LogError ("Couldn't Find Requested VM: " + $vmname)
exit
}
}
else
{
GetVmList
}

# sort the list of vm's
$vmlist_out = $vmlist_out | Sort-Object Name

write-host ("<start>")
if (($vmlist_out -ne $null) -and ($vmlist_out.Count -ne 0))
{
# first print count of VMs outputted
write-host ("<VmCount>"+$vmlist_out.Count+"</VmCount>")

# now loop through our list
foreach ($vm in $vmlist_out)
{
PrintVm($vm)
}
}
else
{
write-host ("<VmCount>0</VmCount>")
}
write-host ("<stop>")

Node 1 - VMS01
Node 2 - VMS02
Cluster - VMSCL
Running the script shows only the maschines on the current node and shows a error for the maschines hostet on node two.

PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "APP01" -isCluster
<start>
<VmCount>1</VmCount>
<vm>
<Name>C2648FA1-B83B-4CC5-BC0E-CCCC7B8A4EF7</Name>
<Element Name>APP01</Element Name>
<Enabled State>2</Enabled State>
<Host Name>VMS01</Host Name>
</vm>
<stop>
PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "ARC01" -isCluster
<start>
<VmCount>1</VmCount>
<vm>
<Name>8F77588D-9B15-4E58-9294-0340289517B2</Name>
<Element Name>ARC01</Element Name>
<Enabled State>2</Enabled State>
<Host Name>VMS01</Host Name>
</vm>
<stop>
PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "BI01" -isCluster
Get-VM : Ein Parameter ist ungültig. Ein virtueller Computer mit dem Namen "BI01" konnte von Hyper-V nicht gefunden
werden.
In C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2\GetVms.ps1:77 Zeichen:19
+ $vm = Get-VM -ComputerName $node $vmname
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (BI01:String) [Get-VM], VirtualizationInvalidArgumentException
+ FullyQualifiedErrorId : InvalidParameter,Microsoft.HyperV.PowerShell.Commands.GetVMCommand

<start>
<VmCount>1</VmCount>
<vm>
<Name>C04FEFDA-5F30-4480-94C0-1C5D55050374</Name>
<Element Name>BI01</Element Name>
<Enabled State>2</Enabled State>
<Host Name>VMS02</Host Name>
</vm>
<stop>
PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "CLOUD" -isCluster
<start>
<VmCount>1</VmCount>
<vm>
<Name>6B8E7B4F-E61C-48D3-A6BA-7B33314B6EF9</Name>
<Element Name>CLOUD</Element Name>
<Enabled State>2</Enabled State>
<Host Name>VMS01</Host Name>
</vm>
<stop>
PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "CTI" -isCluster
<start>
<VmCount>1</VmCount>
<vm>
<Name>4ACE33FB-1E93-4C05-80DC-055171178A60</Name>
<Element Name>CTI</Element Name>
<Enabled State>3</Enabled State>
<Host Name>VMS01</Host Name>
</vm>
<stop>
PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "CTI01" -isCluster
Get-VM : Ein Parameter ist ungültig. Ein virtueller Computer mit dem Namen "CTI01" konnte von Hyper-V nicht gefunden
werden.
In C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2\GetVms.ps1:77 Zeichen:19
+ $vm = Get-VM -ComputerName $node $vmname
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (CTI01:String) [Get-VM], VirtualizationInvalidArgumentException
+ FullyQualifiedErrorId : InvalidParameter,Microsoft.HyperV.PowerShell.Commands.GetVMCommand

<start>
<VmCount>1</VmCount>
<vm>
<Name>B44260B6-3156-4600-85E1-58168A3CC3D5</Name>
<Element Name>CTI01</Element Name>
<Enabled State>2</Enabled State>
<Host Name>VMS02</Host Name>
</vm>
<stop>
PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "CX" -isCluster
<start>
<VmCount>1</VmCount>
<vm>
<Name>1200962C-E875-45F9-B4D9-E7BBA2BBBCFD</Name>
<Element Name>CX</Element Name>
<Enabled State>2</Enabled State>
<Host Name>VMS01</Host Name>
</vm>
<stop>
PS C:\Program Files (x86)\Quest\NetVault Backup\scripts\HyperV\v2> ./GetVms.ps1 "VMSCL" "DC01" -isCluster
<start>
<VmCount>1</VmCount>
<vm>
<Name>16A20330-7ABA-4694-A068-FCCA3D7E1A6F</Name>
<Element Name>DC01</Element Name>
<Enabled State>2</Enabled State>
<Host Name>VMS01</Host Name>
</vm>
<stop>

↧

Setup a Cluster with CSV storage as a target for a DFS-R

January 25, 2019, 6:11 am

≫ Next: Live Migrate fails with event 21502 (2019-->2016 host)

≪ Previous: ++ 2 2012R2 Node Hyper-V Cluster Connection Issue ++

Trying to set up a windows 2016 server Cluster Server to replicate with a file Server.

This a the setup:
Primary site:
- 2 DFS
- 1 Cluster, 2 nodes, set with CVS storage

Secondary Site (DR)
- 1 DFS
- 1 File Server

The goal : I want to replication from the Cluster/File server from the primary site the secondary.

Want I add the folder to replicate, I add this message :

"The volume file system cannot be determined the netword name cannot be found"

Thank you,

Jasmin

↧

Live Migrate fails with event 21502 (2019-->2016 host)

January 25, 2019, 8:43 am

≫ Next: Hyper-V Event 4096

≪ Previous: Setup a Cluster with CSV storage as a target for a DFS-R

I have 2016 Functional level cluster with Server 2019 (basically in a process of replacing 2016 host with 2019)

If VM is running on 2019 host I can poweroff, quick migrate to 2016 host, power on & all is good

But Live migration always gives me above error

All I am getting in Event Data is (very descriptive?!):

Live migration of 'Virtual Machine Test' failed.

Nothing else, no reason.

If VM is running on 2016 host I CAN do live migration to 2019 fine! (albeit with errors reported in this thread, but I do NOT have VMM being used!)

vm\service\ethernet\vmethernetswitchutilities.cpp(124)\vmms.exe!00007FF7EA3C2030: (caller: 00007FF7EA40EC65) ReturnHr(138) tid(2980) 80070002 The system cannot find the file specified.
    Msg:[vm\service\ethernet\vmethernetswitchutilities.cpp(78)\vmms.exe!00007FF7EA423BE0: (caller: 00007FF7EA328FEE) Exception(7525) tid(2980) 80070002 The system cannot find the file specified.
]

Both host are IDENTICAL hardware on same firmware level of every component!

There is NOTHING relating to even attempting migration in local host Hyper-V VMMS/Admin/Operational logs

In Hyper-V High Availability/Admin I get same error but with Even ID 21111

Seb

I am wondering if it is easier to ditch 2019 & stick with 2016 for now

↧

Hyper-V Event 4096

October 13, 2017, 6:20 am

≫ Next: Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

≪ Previous: Live Migrate fails with event 21502 (2019-->2016 host)

The Data Exchange integration service is either not enabled, not running or not initialized. (Virtual machine ID 46DA141F-B26D-4A00-897-6E1EFD7B0B)

Can anyone help me to resolve this above issue?

Regards

Fahad Ahmed

↧

Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

December 9, 2016, 1:27 am

≫ Next: Windows 2012 R2 - Fileserver cluster file indexing

≪ Previous: Hyper-V Event 4096

We are trying to deploy 'guest cluster' scenario over HyperV with shared disks set over SOFS. By design .vhds format should fully support backup feature.

All machines (HyperV, guest, SOFS) are installed with Windows Server 2016 Datacenter. Two HyperV virtual machines are configured to use shared disk in .vhds format (located on SOFS cluster formed of two nodes). SOFS cluster has a share configured for applications and HyperV uses \\sofs_server\share_name\disk.vhds path to SOFS remote storage). Guest cluster is configured with 'File server' role and 'Failover clustering' feature to form a guest cluster. There are two disks configured on each of guest cluster nodes: 1 - private system disk in .vhdx format (OS) and 2 - shared .vhds disk on SOFS.

While trying to make a checkpoint for guest machine, I get following error:

Cannot take checkpoint for 'guest-cluster-node0' because one or more sharable VHDX are attached and this is not part of a checkpoint collection.

Production checkpoints are enabled for VM + 'Create standard checkpoint if it's not possible to create a production checkpoint' option is set. All integration services (including backup) are enabled for VM.

When I delete .vhds disk of shared drive from SCSI controller of VM, checkpoints are created normally (for private OS disk).

It is not clear what is 'checkpoint collection' and how to add shared .vhds disk to this collection. Please advise.

Thanks.

↧

Windows 2012 R2 - Fileserver cluster file indexing

January 25, 2016, 12:16 am

≫ Next: Server 2016 HV Cluster - Storage migration for virtual machine ' failed with error 'Unspecified error' (0x80004005).

≪ Previous: Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

Is it possible to setup file indexing/windows search on a Windows 2012 R2 fileserver cluster? Is it cluster aware?

I have read some old posts that indexing is not cluster aware. Not sure if that is relevant still.

This doc descripbes a way, but this is also old ...

https://768kb.wordpress.com/tag/windows-failover-cluster/

Anyone that knows or have done it successfully?

Roy

↧