Windows 2012 R2 Hyper-V cluster - One of the nodes VMs are hanging randomly

March 2, 2020, 12:04 am

≫ Next: NLB doesn't work when two nodes online

≪ Previous: Event id 260 - Hyper-V-VmSwitch - Failed to move RSS queue

We have windows 2012 R2 Hyper-V cluster (@ Node with FC CSV storage) and we have noticed that some VMs are not responding and VMs are not reachable along with host for few seconds. This is affecting application within VM. After moving the VM to another Host, its working fine.

We have tried following till now-

- All drivers and firmware are updated to latest , Tried multiple drivers for FC/ Network adapters

- Enabled /Diabled VMQ

- Cluster Validation result is fine

Now, we have only option to reformat the system and need your advise before doing that.

↧

NLB doesn't work when two nodes online

March 2, 2020, 8:37 pm

≫ Next: s2d 4 nodes cluster

≪ Previous: Windows 2012 R2 Hyper-V cluster - One of the nodes VMs are hanging randomly

I am using Windows Network Load Balancing (according to this), the main purpose is building a High Availability (failover) for my RD Gateways, but I can't get the cluster worked properly. I have two nodes(both are VM, RD Gateway installed and only one NIC) in the NLB cluster:

Cluster Name: rdgw (virtual-ip: 192.168.0.18)

Host-1: rdgw-1 (192.168.0.21) (NLB installed, RD Gateway Server Farm joined, static IP, Converged)

Host-2: rdgw-2 (192.168.0.16) (NLB installed, RD Gateway Server Farm joined, static IP, Converged)

Custer Parameter -> Cluster Operation Mode: Multicast

Port Rule -> Filtering Mode: Single Host

Everything could be working when one of the two nodes is down!!

I already read this thread (NLB does not work when two nodes are present), but I don't think we have the same problem, cause I run all my nodes on OpenStack, so the Virtual-IP can map to the MAC addresses correctly.

I still have no idea how to solve my problem, below is what I have done for troubleshooting:

1. host-1 and host-2 can ping each other

2. From outside, ping 192.168.0.18 => DUP (two nodes respond to ICMP message, refer to this but doesn't work)

3. From outside, telnet 192.168.0.18 443 => Connection closed by foreign host

4. Check ipconfig /all in two nodes => Both nodes have two IP addresses, one is the original IP, another is Virtual-IP

IPv4 Address ......... 192.168.0.21(Prefered)
Subnet Mask .......... 255.255.0.0
IPv4 Address ......... 192.168.0.18(Prefered)
Subnet Mask .......... 255.255.0.0
Default Gateway ...... 192.168.0.1

I feel confused, I don't think this is a normal situation, this will cause IP Conflicts, am I right? To my understanding, Virtual-IP should only be configured on one of node at any given time.

I google some information about NLB heartbeats, it works with Layer 2, so there is no need to open specific ports for it.

I don't know what I am missing, maybe some miss configurations or heartbeats blocked by firewall ?

I would appreciate any help.

↧

s2d 4 nodes cluster

March 2, 2020, 11:48 pm

≫ Next: Perform an hard failure test

≪ Previous: NLB doesn't work when two nodes online

dears,

i have 4 nodes s2d cluster 2019.

all vms now are across 3 nodes. im facing issue with one node. i'm thinking of removing him and to format his os and add it back.

can you help with that please? especially on the pool and the local disks level. what is the best way to do that

kind regards,

↧

Perform an hard failure test

March 3, 2020, 10:45 am

≫ Next: 5120 Alerts - Storage Spaces Direct - STATUS_CONNECTION_DISCONNECTED

≪ Previous: s2d 4 nodes cluster

Good evening,

I have a two node windows 2019 cluster with sql and file server roles on it.

Everything works fine if I shutdown or perform a maunal failover.

The customer asks to perform an hard failure of a two nodes windows2019 cluster.

I tryed to simulate the unplugging from the network (it's a Vmware enviroment) but the cluster didn't switch and all services became unavailable.

My question is how can perform the hard test without forcing power interruption and what is the outcome?

I expect that the other node brings online the resources . It's correct?

Kind regards

Luca Pozzoli

↧

5120 Alerts - Storage Spaces Direct - STATUS_CONNECTION_DISCONNECTED

July 26, 2018, 1:38 am

≫ Next: 5120 event in S2D 4 node cluster

≪ Previous: Perform an hard failure test

Good Day

I have a 5 node Cluster made up of SuperMicro SuperStorage 6028R-E1CR24L

I keep getting below alerts on my S2D cluster:

Cluster Shared Volume 'CSV_ReFS' ('CSV_ReFS_01') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Cluster Shared Volume 'CSV_ReFS' ('CSV_ReFS_02') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

All servers have 2 X50GB connection to a 100GB Mellanox switch, we use SET teaming on SMB NIC's

When running my Cluster Validation i see the following issues on my Network Validation:

Node HPV02 is reachable from node HPV01 by multiple communication paths, but one or more of these paths experienced more than10% packet loss.

Result	Source Interface Name	Source IP Address	Destination Interface Name	Destination IP Address	Same Cluster Network	Packet Loss (%)
Success	HPV01 - vEthernet (SMB-A)	192.168.254.201	HPV02 - vEthernet (SMB-A)	192.168.254.202	TRUE	22
Failure	HPV01 - vEthernet (SMB-A)	192.168.254.201	HPV02 - vEthernet (SMB-B)	192.168.255.202	FALSE	100
Success	HPV01 - vEthernet (SMB-B)	192.168.255.201	HPV02 - vEthernet (SMB-B)	192.168.255.202	TRUE	51
Failure	HPV01 - vEthernet (SMB-B)	192.168.255.201	HPV02 - vEthernet (SMB-A)	192.168.254.202	FALSE	100

Not sure what i am misssing.? Is this an issue on the Mellanox Switch or is this an issue on Host Config

Thanks

↧

5120 event in S2D 4 node cluster

March 4, 2020, 5:20 am

≫ Next: no disk suitable for cluster disk were found

≪ Previous: 5120 Alerts - Storage Spaces Direct - STATUS_CONNECTION_DISCONNECTED

I am getting constant 5120 event ID in my 2019 S2D cluster, 4 node 100GB network (mellanox) between them, configured with ROCE.

I see some Pause packets on the switch interfaces but it does not seem to be in any pattern with the 5120 event id's

Cluster Shared Volume 'Mirror-01' ('Cluster Virtual Disk (Mirror-01)') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

has anyone experienced this ?

regards

Sveinbjorn

↧

no disk suitable for cluster disk were found

August 4, 2013, 9:52 pm

≫ Next: Failover Cluster in Windows Server 2012r2: "User Defined Resource Type"

≪ Previous: 5120 event in S2D 4 node cluster

Hi,

I have 2 Node Cluster on windows 2012.This cluster is used for SQL Server 2012 with High Availability group. in AG storage not shared, each node has its in depended storage.

Cluster was working find. Due to some issue Disk was not presented/offline at node 2. so i just removed storage/disk from node 2 using failover cluster. I went to failover cluster --> storage and remove disk.

now disk is online on node 2. restart both node. I run cluster validation test and it also show disk in result. But problem is that node 2 disk is not listed under cluster disk resource. I try to add disk resource and getting following error

kindly advice how to fix the issue.

thx

iffi

↧

Failover Cluster in Windows Server 2012r2: "User Defined Resource Type"

March 4, 2020, 7:36 am

≫ Next: quorum

≪ Previous: no disk suitable for cluster disk were found

Hello,

What is the use of "User Defined Resource Types"? Where to obtain/create one?

Can somebody explain how this option work?

Thank you!

↧

quorum

July 15, 2019, 1:22 am

≫ Next: SQL Cluster Drives moving automatically

≪ Previous: Failover Cluster in Windows Server 2012r2: "User Defined Resource Type"

Hi,

If there is two node and both nodes are up and running but the heartbeat lost , in that case how node will decide who aill be active

Thanks

↧

SQL Cluster Drives moving automatically

March 5, 2020, 4:57 am

≫ Next: Move server cluster roles to inactive node

≪ Previous: quorum

Hi,

We have 4 different SQL Clusters and every SQL cluster have 2 nodes, the issue is SQL cluster drives are moving

automatically to other node, we have to move preferred owner manually.

OS: Server 2008 R2 Standard for 3 SQL cluster

server 2016 standard for 1 SQL cluster.

Storage : Unity 300

Site: DR site

Could you please help on this issue.

Regards

Vajram Gajengi

↧

Move server cluster roles to inactive node

March 5, 2020, 5:45 pm

≫ Next: How to manually reset failure count

≪ Previous: SQL Cluster Drives moving automatically

Hello, I want to fire a script/one liner prior to patching a cluster node. I want posh to detect if its the active node. If so, move the roles, services, etc to the inactive/standby node.

As a result I can have my patch management application proceed to install/reboot, etc.

I found a few resource but it seems to do much more than I really need.

http://lifeofageekadmin.com/using-powershell-move-cluster-resources-preferred-node/

Any advise on how I can get started?

So far I have collected this but I am not sure how to further tweak it to dynamically move to the inactive node.

$TargetNode = Get-ClusterNode -cluster 'MYCLUSTERNAME' | ? {$_.State -eq 'Up' -and $_.Name -ne $CurrentNode}

Thank you.

↧

How to manually reset failure count

March 6, 2020, 1:56 am

≫ Next: Hyper-v Clustering with NIC Teaming

≪ Previous: Move server cluster roles to inactive node

How to manually reset the failure count ?

I set the following :

and I still see in the cluster log

↧

Hyper-v Clustering with NIC Teaming

March 8, 2020, 7:06 am

≫ Next: Disk requrements for Storage Replica: "Shared Nothing" Failover File Server.

≪ Previous: How to manually reset failure count

Dear sir/Madam

I have 4X 1Gigabit NIC Teaming ethernet with LACP on windows server 2016 when I am using ISCSI target with another server 4 gigabit NIC Teaming LACP it works perfectly and I get the 4 gigabit bandwidth.

I have configured Failover clustering and shared-storage on that ISCSI but now I am get only 1 gigabit on the storage cluster. Both Cluster Have NIC Teaming and the ISCSI target have NIC Teaming also of 4 gigabit.

Can someone help me Please.

↧

Disk requrements for Storage Replica: "Shared Nothing" Failover File Server.

March 9, 2020, 2:46 am

≫ Next: MPIO Disk not managed by a server

≪ Previous: Hyper-v Clustering with NIC Teaming

Hello,

What requirements are we looking at when it comes to a Shared Nothing Failover File Server when it comes to disk? I have been looking into this in my virtual Lab and been trying to get this running with VHDX on virtual SCSI controller on Windows Server 2019 guests. With no luck, it seems that I can only provide disks to the cluster when they are provided with iSCSI. I might misunderstand something but what is the point of a shared nothing cluster then?

Get-PhysicalDisk shows all disk but no one of them are available for the Cluster. “Get-ClusterAvailableDisk -All” shows nothing.

So, am I missing something here? Or is it that the idea with this is to replicate data between nodes in the cluster that does not have common iSCSI?

↧

MPIO Disk not managed by a server

March 9, 2020, 2:58 pm

≫ Next: VDI storage solution that does not require credential delegation?

≪ Previous: Disk requrements for Storage Replica: "Shared Nothing" Failover File Server.

I am trying to setup a Hyper-V Cluster, but I have an error:

Disk test 0 from node hv2 has the following number of usable paths to the target magazine object: 5

Disk test 0 is not managed by the hv1

same error with disk test 1

I am creating cluster from hv1

QFE numbers are the same on both servers

↧

VDI storage solution that does not require credential delegation?

March 11, 2020, 7:39 am

≫ Next: host unreachable warning in NLB manager

≪ Previous: MPIO Disk not managed by a server

Originally, I planned to use a clustered SMB file share to store virtual machines and related data for our Hyper-V hosts to access/use.

I recently found that my company applies a GPO that prevents both storing credentials in Credential Manager and delegating credentials to other servers (except through Kerberos). They have rejected my request to remove this GPO.

In my situation, Kerberos is not an option as I need to use a local account. The virtual machines stay in a workgroup throughout the provisioning process and aren't added to the domain until the final step before deployment.

Currently, I am working with three servers: two Hyper-V hosts and the file share. Since SMB is not an option anymore, I will probably end up turning that server into a Hyper-V host as well. In the future, we plan to add more Hyper-V hosts to the cluster as we accumulate more hardware. We obviously want to take advantage of high availability.

As I can't store or delegate any credentials to access the SMB share on a separate server, I need to find other storage solutions and compare the benefits/drawbacks and find what will realistically work for our scenario. Which are some that I can use without needing to store/delegate credentials?

↧

host unreachable warning in NLB manager

August 15, 2018, 11:50 am

≫ Next: How to take a S2D server offline for maintenance correctly

≪ Previous: VDI storage solution that does not require credential delegation?

We setup Windows NLB (unicast) on 2 Windows 2016 servers. The HA and load balancing works as expected. However, I get these the host unreachable warning in the NLB manager from both servers. Sometimes, the warning disappears but most of the time are there.

I saw that MS suggests to use 2 NICs on NLB host for unicast mode. But I also saw reference about not needing it because UnicastInterHostCommSupport is used for above that is higher than 2003sp1 and I did see those registry keys in the OS.

↧

How to take a S2D server offline for maintenance correctly

February 21, 2018, 9:45 pm

≫ Next: CLUSTER COMMUNICATION NETWORK FIGURE

≪ Previous: host unreachable warning in NLB manager

Hi Expert,

I recently deployed three nodes S2D with 3-way mirrors and all three nodes are fully patched monthly, and the patch level is as of Feb-2018 KB4074590, and I would like to ask you guys how to put the S2D node in maintenance correctly, as mine seems behavior different than the MS docs: https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/maintain-servers

Here is what I do:

1. Drain the role in Failover Cluster Mgr

2. Ensure there is no background storage job and virtual disk is healthy by using get-virtualdisk and get-storagejob command

3. Install the patch

4. Reboot the server

5. Resume the role

Here is my questions:

1. After I drain the role without reboot, what the status of the volume should be? On the MS doc, the status should be In service, but mine is actually said healthy, please see below:

[ Image 1 ]

2. As soon as I restart node 1, the storagejob start straight away as below, but shouldn't it be no job until i resume the role:

[ Image 2 ]

3. As soon as node 1 come back without resume the role, the storage repair job run automatically, is it normal? Shouldn't it run after I resume the role, please see below?

[ Image 3 ]

My assumption is even I drain one of the node, the S2D volume is still online and accepting IO. It feels like I kill the server and the whole volume need to resync. Ideally the resync should be quick as it only do the changes during the downtime, but mine took almost 12 hours to resync for 3 x 8TB volumes.

Can you please give me some idea?

[ It won't let me post images for some reason, they are available here: https://imgur.com/a/FnEl0 ]

Thanks

↧

CLUSTER COMMUNICATION NETWORK FIGURE

February 23, 2020, 12:18 pm

≫ Next: "A Modal Loop is Already in progress"

≪ Previous: How to take a S2D server offline for maintenance correctly

HELLO MICROSOFT

i hope to fix cluster configuration in next windows server version stable... to be easy setup and more clear.

I spend a week to figure cluster communication network between two nodes servers.. here is my case study to see if i can get answer to my questions.

I do different Scenarios for bellow two node in real physical hardware and virtualization to be manage storage in PowerEdge 430r both nodes

NODE#1

SERVER NAME: CLUSTER-A

OS: WINDOWS SERVER 2019 DATACENTER

NIC1: 192.168.2.1 as Management (Mgmt)

NIC2: 172.16.1.1 as (SMB01)

NIC3: 172.16.2.1 as (SMB02)

NIC4: 192.168.1.1 as Internet

NODE#2

SERVER NAME: CLUSTER-B

OS: WINDOWS SERVER 2019 DATACENTER

NIC1: 192.168.2.200 as Management (Mgmt)

NIC2: 172.16.1.2 as (SMB01)

NIC3: 172.16.2.2 as (SMB02)

NIC4: 192.168.1.200 as Internet

_________

Scenario#1

_________

1- Connect both nodes (CLUSTER-A,CLUSTER-B) in >>real physical switch<<between CLUSTER-A NIC1 and CLUSTER-B NIC1

2- Install hyper-v and other cluster features tools in both nodes.

3- create virtual switch(Mgmt) and virtual switch (Internet) in both CLUSTER-A,CLUSTER-B.

4- create virtual primary domain server in CLUSTER-A and virtual secondary domain server in CLUSTER-B.

5- connect both virtual domains by virtual switch(Mgmt) and virtual switch (Internet).

6- restart both and be sure they are working fine.

7- create virtual GATEWAY server inside CLUSTER-A and create other in CLUSTER-B to manage nodes in future and connected with virtual (Mgmt) and virtual Internet.

8- NOW.. direct connect CLUSTER-A NIC2 with CLUSTER-B NIC2 by cat6 cable and CLUSTER-A NIC3 with CLUSTER-B NIC3 with other cat6 cable.

9- DELETE .. virtual (Mgmt) in CLUSTER-A THEN manage nodes from GATEWAY IN CLUSTER-B to create cluster network configurations and virtual SET switch in CLUSTER-A by powershell

Invoke-Command -ComputerName "CLUSTER-A" -ScriptBlock {New-VMSwitch -Name SETSwitch - EnableEmbeddedTeaming $TRUE -EnableIov $true -NetAdapterName NIC1,NIC2,NIC3

RESULT >>

NIC1 - (Mgmt)

NIC2 - (SMB01)

NIC3 - (SMB02)

10- be sure to complete all cluster configure and check new virtual SETSwitch to hold NIC1 as management network

11- now connect GATEWAY , Primary Domain to SETSwitch in CLUSTER-A.

12- do the same ABOVE steps in CLUSTER-B.

THE RESULT>>>

every things work good NO ERROR when i RUN powershell

Test-Cluster -Node "CLUSTER-A,CLUSTER-B" -Include "Storage Spaces Direct","Inventory","Network","System Configuration","Hyper-V Configuration"

### BUT ###

when i restart any node i can't connected from LAN even i ping to it.. until i take off both cable from NIC2 and NIC3 then the system work

i have tried to check from inside by access ANY GATEWAY server i see i can ping and every thing work !! before i take off both cable..

if i connect both nodes NIC2,NIC3 to physical switch and run test cluster i got error

the same cluster network, yet address is not reachable from 172.16.1.1 using UDP on port 3343.

MY QUESTION HERE >> is it important to be my domain outside nodes or must used other cables rather than cat6 for connect two nodes directly .. or any suggestions??

_________

Scenario#2

_________

1- create 4 nodes virtual environment by HYPER-V for CLUSTER-A, CLUSTER-B, DOMAIN and GATEWAY

2- create 3 virtual switch adapter (Mgmt),(SMB01),(SMB02)

2- connect all by virtual adapter (Mgmt)

3- add (SMB01),(SMB02) to CLUSTER-A and CLUSTER-B

4- make sure (SMB01),(SMB02) configure to include MAC address spoofing in adapter feature.

5- from GATEWAY server install cluster feature and network configuration in both CLUSTER-A, CLUSTER-B and test cluster.

RESULT>>>

Fine NO ERROR

BUT

when i disable MAC address spoofing in adapter feature and test cluster in both nodes

the same cluster network, yet address is not reachable from 172.16.1.1 using UDP on port 3343.

MY QUESTION

is possible to do enable MAC address spoofing in real physical adapter

or you have any ideas about that

↧

"A Modal Loop is Already in progress"

September 24, 2010, 2:47 pm

≫ Next: Hyper-V Replication Broker configuration error

≪ Previous: CLUSTER COMMUNICATION NETWORK FIGURE

I have a clean install of a windows 2008 R2 cluster . My domain (still) in Windows 2000.

I'm receiving the message "A Modal Loop is Already in progress" when trying to add a Share directory on Windows Server 2008 R2, Cluster Service (putting a AD Group in the permissions of that share. if i put an AD user i don't have any error) .

Anyone has any idea?

Thank's in advance

↧