Quantcast
Channel: High Availability (Clustering) forum
Viewing all 5654 articles
Browse latest View live

Live migration and Quick Migration failing

$
0
0

Hopefully someone can help with this mystery

I have a 3 Node 2008R2 Hyper-v cluster connected to NetApp iscsi storage

Cannot live migrate or quick migrate if the underlying cluster shared volume is not owned by the node to which the vm is being migrated. Even after moving storage live migration may fail or the running vm may shut down. The only elegant way to achieve any form of migration is to shut down the vms running on any CSV and migrate that volume and move all the machines to the node now owning the CSV.

Following Event ids are logged:

Event Id 1069 from source FailoverClustering “Cluster resource 'Virtual Machine Configuration <VMName> in clustered service or application <VMName> failed.”

Event Id 1205 from source FailoverCLustering “The Cluster service failed to bring clustered service or application <VMName> completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.”

Event id 4096  from source Hyper-V-Config “The Virtual Machines configuration 3989B536-28A2-48EB-982E-7B92E0235D54 at 'C:\ClusterStorage\Volume9\<VMName>' is no longer accessible: Logon failure: unknown user name or bad password. (0x8007052E)”

Event Id 16300 from source Hyper-v-VMMS “Cannot load a virtual machine configuration: Logon failure: unknown user name or bad password. (0x8007052E) (Virtual machine ID 3989B536-28A2-48EB-982E-7B92E0235D54)”

Event Id 20100 from source Hyper-V-VMMS :The Virtual Machine Management Service failed to register the configuration for the virtual machine '3989B536-28A2-48EB-982E-7B92E0235D54' at 'C:\ClusterStorage\Volume9\<VMName>': Logon failure: unknown user name or bad password. (0x8007052E)”

Event id 21102 from source Hyper-v-High-Availability “Virtual Machine Configuration <VMName> failed to register the virtual machine with the virtual machine management service.”

Event id 21502 from source Hyper-v-High-Availability “Virtual Machine Configuration <VMName> failed to register the virtual machine with the virtual machine management service.”

Cluster configuration passes validity with few minor warnings that I have always had previously and everything worked fine.

If I try to browse cluster storage volumes at c:\clusterstorage\Volumex  from a non-owning node I get:

C:\ClusterStorage\Volumex is not accessible.

Logon failure: unknown user name or bad password.

This behavior to the best of my knowledge has started after the latest set of patches released by MS on March 2015. There are no authentication protocols between the SAN and the Hyper-v cluster nodes. I do not want to rollback all the patches and then do a trial and error process if I can help it.

Googling / Binging for these set of combined symptons has led to absolutely nothing useful. Am willing to post a sanitised cluster log if necessary.

Thanks

File server on cluster

$
0
0

Hi

I have failover cluster on win server 2008 R2 that host file service fro 2 nodes (File1 , File2)

I make failover with no issues but once i failover to file server1 some users cannot access the shared volumes

some users when they make logoff and login they also cannot access the shared volumes

what was this issue ?

need help and how can i monitoring ?


MCP MCSA MCSE MCT MCTS CCNA

Win2012r2/08r2 Disk Management 0MB Disk

$
0
0

Hello everyone,

we are using a 2008r2 SQL Cluster and a 2012r2 File Cluster, on both we have the same issue.

There are 0MB Disk to find under the Diskmanagement

I dont know were the came from, because under DISKPART oder Get-disk you dont see them.

Is this just a display bug? Or does anyone now a reson for this. 

Take care


Daniel 

Question about cluster node NodeWeight property

$
0
0

Hi,

I have a three nodes (A/B/C) windows 2008 r2 sp1 cluster testCluster, and installed KB2494036 for three nodes,suppose Node A is a active node.

 I configured node C's NodeWeight property to 0, and node A and node B keep default (NodeWeight=1). I also added a shared disk Q for cluster quorum.

So i want to know if node C and Node B are down , is the windows cluster testCluster down as lost of quorum or keep up?

At the first i thought testCluster should keep up , because the cluster has 2 votes (node A and quorum), node B is down, node C doesn't join voting. But after testing, testCluster  was down as  lost of quorum.

So anybody konw the reason,thanks.

 


Understanding service high availability concept

$
0
0

Hello,

I have a windows 2008 R2 cluster set up with two nodes and a virtual IP. We have one product which runs as a service on windows. It has been installed on both these nodes.

I'm trying to ensure that our product is highly available under following circumstances:
1) Node 1 fails (hardware failure): We have configured our product to listen on virtual IP of the cluster. So in this case node2 comes up and gets the virtual IP and as a result all incoming requests to our product are served by our product server running on node2.

2) Our product service goes down on node1: This is the scenario where i'm bit confused. As per my understanding i have two options as described below:

Option 1: Use the "services and applications" feature of the Failover Cluster Manager and configure our product as a highly available service. In this case i have to ensure that service listens on a different IP. Now, in this case will my service be highly available under both these cases:
case 1) when our service on node1 goes down. Will it bring up the service on node2 and start routing the requests to node2
        case 2) when node1 itself goes down. Will it bring up the service on node2.

Option 2: Make my service cluster aware. Our service when it's running on passive node, should continuously ping and check the health of service on primary node. If for some reason we don't get a response from service on active node, then passive node should become active node. 

Can you please suggest, which will be the best approach for us?

Regards,
Avadhut


Cluster Manager test fails on Storage Section

$
0
0
Running the test for cluster failover and I am not sure if I can proceed to create the cluster with the error that I have.  I understand that there is no failover for the local machine's C: Drive (boot), all other tests pass.  Or is there a way to remove the local drive from the test or from mpio?

Storage
NameResultDescription
List All DisksSuccess
List Potential Cluster DisksSuccess
Validate Disk Access LatencySuccess
Validate Disk ArbitrationSuccess
Validate Disk FailoverFailed
Validate File SystemCanceled
Validate Microsoft MPIO-based disksSuccess
Validate Multiple ArbitrationSuccess
Validate SCSI device Vital Product Data (VPD)Success
Validate SCSI-3 Persistent ReservationSuccess
Validate Simultaneous FailoverCanceled

Here is the detailed section:

List All Disks

    List all disks visible to one or more nodes (including non-cluster disks).
    Prepare storage for testing
    Preparing storage for testing on node tluxhs1.tluxvm.orl.com
    Preparing storage for testing on node tluxhs2.tluxvm.orl.com

    tluxhs1.tluxvm.orl.com

    Getting information on PhysicalDrive 0 from node tluxhs1.tluxvm.orl.com
    Getting information on PhysicalDrive 1 from node tluxhs1.tluxvm.orl.com
    Getting information on PhysicalDrive 2 from node tluxhs1.tluxvm.orl.com
    Getting information on PhysicalDrive 3 from node tluxhs1.tluxvm.orl.com
    Getting information on PhysicalDrive 4 from node tluxhs1.tluxvm.orl.com
    Getting information on PhysicalDrive 5 from node tluxhs1.tluxvm.orl.com
    Getting information on PhysicalDrive 6 from node tluxhs1.tluxvm.orl.com
    Disk NumberDisk IdentifierDisk Bus TypeDisk Stack TypeDisk Address (PORT:PATH:TID:LUN)Adapter DescriptionEligible for ValidationDisk Characteristics
    PhysicalDrive0fceb8fb7Bus Type SASStor Port2:0:0:0Dell SAS 6/iR Integrated ControllerFalseDisk is a boot volume. Disk is a system volume. Disk is used for paging files. Disk is used for memory dump files. Disk is on the system bus. Disk partition style is MBR. Disk partition type is BASIC.
    PhysicalDrive1871348a8Bus Type iSCSIStor Port3:0:0:0Microsoft Multi-Path Bus DriverTrueDisk partition style is MBR. Disk partition type is BASIC. Disk is Microsoft MPIO based disk
    PhysicalDrive2871348bcBus Type iSCSIStor Port3:0:3:1Microsoft Multi-Path Bus DriverTrueDisk partition style is MBR. Disk partition type is BASIC. Disk is Microsoft MPIO based disk
    PhysicalDrive3871348a4Bus Type iSCSIStor Port3:0:3:2Microsoft Multi-Path Bus DriverTrueDisk partition style is MBR. Disk partition type is BASIC. Disk is Microsoft MPIO based disk
    PhysicalDrive4a9429adaBus Type iSCSIStor Port3:0:3:3Microsoft Multi-Path Bus DriverTrueDisk partition style is MBR. Disk partition type is BASIC. Disk is Microsoft MPIO based disk
    PhysicalDrive587134890Bus Type iSCSIStor Port3:0:0:4Microsoft Multi-Path Bus DriverTrueDisk partition style is MBR. Disk partition type is BASIC. Disk is Microsoft MPIO based disk
    PhysicalDrive6<Unknown>Bus Type iSCSIStor Port3:0:2:31Microsoft Multi-Path Bus DriverFalseDisk partition style is RAW. Disk is Microsoft MPIO based disk

 

Any help would be greatly appreciated.

Robert Ramos

Hyper-V 2012 does not scale and is not stable enough for production use WHO has 200+ VM's with stability? Event ID 1146, 1230, 5120

$
0
0

For years now, we have had event ID 1146 crash nodes in the cluster (RHS process crashes).  We have had several paid microsoft cases open, even one with premier.  In fact we have one open currently with zero progress in 72 hours (115012612321318). 

Is anyone really running 200+ machines out there with Hyper-V with any level of stability in production, or do you have a complete host (event id 1146) or volume (event id 5120) outage every month or so?  

We have applied recommended hotfixes, and gone through the configuration many many times.

My only conclusion is that Hyper-V does not scale.  Once we started adding a lot of machines and hosts, we started getting event 5120 (with STATUS_IO_TIMEOUT) which is unacceptable.   Causes a huge slowdown or makes an entire volume inaccessible and impacts EVERY machine in the volume.  The other volumes work when this happens.  In fact, we have a VMware cluster attached to the same san with the same host hardware, and it works flawlessly.  Both use MPIO, so the timeout is caused by Hyper-V.  The load was nearly identical on Vmware and Hyper-V at one time, we had 100 machines on both and the same amount of hosts.   CPU load is tiny, memory is less than 50%, IO uses 55 disk spindles for normal storage and another 55 for fast storage.

I'm more or less asking the community how to fix this since the support is not working, but I'm guessing there is no fix and this is really not production ready.  I would really like to here from ANYONE (non-sales) that is using 200+ machines without big outages.


Witness Client failed to Register

$
0
0

I have a recently built pair of W2012 R2 servers with all of the applicable updates applied via WSUS.  They are clustered servers but are not using shared storage.  Instead they are using Vision Solutions' "High Availability for Windows" to replicate several drives.  Also installed is SQL 2012 (clustered) and all available updates.  I have used this solution several times previously on other clusters.  The servers are performing as expected but are not yet in production.  A few days ago I began to see the following notifications approximately every 20 seconds on the "owning" node (PCSCALEA) of the cluster:

Log Name:      WitnessClientAdmin
Source:        Microsoft-Windows-SMBWitnessClient
Date:          3/26/2015 9:54:23 AM
Event ID:      8
Task Category: None
Level:         Error
Keywords:     
User:          NETWORK SERVICE
Computer:      PCScaleA.rms.org
Description:
Witness Client failed to register with Witness Server PCSCALEB for notification on NetName\\Pcscale with error (The parameter is incorrect.)
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-SMBWitnessClient" Guid="{32254F6C-AA33-46F0-A5E3-1CBCC74BF683}" />
    <EventID>8</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2015-03-26T13:54:23.951194200Z" />
    <EventRecordID>114860</EventRecordID>
    <Correlation />
    <Execution ProcessID="1624" ThreadID="9612" />
    <Channel>WitnessClientAdmin</Channel>
    <Computer>PCScaleA.rms.org</Computer>
    <Security UserID="S-1-5-20" />
  </System>
  <EventData>
    <Data Name="WitnessServerIP">PCSCALEB</Data>
    <Data Name="NetName">Pcscale</Data>
    <Data Name="Error">87</Data>
  </EventData>
</Event>

The following errors appear approximately every 20 seconds on the "non-owning node" (PCSCALEB):

Log Name:      WitnessServiceAdmin
Source:        Microsoft-Windows-SMBWitnessService
Date:          3/26/2015 9:51:43 AM
Event ID:      5
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      PCScaleB.rms.org
Description:
Witness Service registration request from Witness Client (PCSCALEA.RMS.ORG) for NetName\\PCSCALE failed with error (The parameter is incorrect.)
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-SMBWitnessService" Guid="{CE704B50-B105-4BC8-A24F-1792C0401C2A}" />
    <EventID>5</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2015-03-26T13:51:43.906614200Z" />
    <EventRecordID>153093</EventRecordID>
    <Correlation />
    <Execution ProcessID="5252" ThreadID="5368" />
    <Channel>WitnessServiceAdmin</Channel>
    <Computer>PCScaleB.rms.org</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="ClientName">PCSCALEA.RMS.ORG</Data>
    <Data Name="NetName">PCSCALE</Data>
    <Data Name="ErrorCode">87</Data>
  </EventData>
</Event>

Any ideas?  Any suggestions?

Thanks,

Ryan

 

Network Metrics on HYPER-V 2012 R2 cluster nodes

$
0
0

Hello

I'll have an Hyper-V Cluster, it is configured with a  CSV, LIVE, and and management network.

When I ping from  inside a node to the same nodename, it returns me the CSV  IP address network.

Do you know the reason? It's correct that ?

I know that is related with network metrics, because I've manually set a low value (3), and it returns the right Ip. Should I change metric it in this way?

THIS is what it happens: (172.22.64.98 is the CSV network, not the host IP)

ping mycomputer

Pinging mycomputer.adgbs.com [172.22.64.98] with 32 bytes of dat
Reply from 172.22.64.98: bytes=32 time<1ms TTL=128
Reply from 172.22.64.98: bytes=32 time<1ms TTL=128
Reply from 172.22.64.98: bytes=32 time<1ms TTL=128
Reply from 172.22.64.98: bytes=32 time<1ms TTL=128

Ping statistics for 172.22.64.98:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

And this is what IPCONFIG /ALL  Reports:

Ethernet adapter Mgmt:

   Connection-specific DNS Suffix  . :
   IPv4 Address. . . . . . . . . . . : 172.22.172.45
   Subnet Mask . . . . . . . . . . . : 255.255.255.128
   Default Gateway . . . . . . . . . : 172.22.172.126

Ethernet adapter CSV:

   Connection-specific DNS Suffix  . :
   IPv4 Address. . . . . . . . . . . : 172.22.64.98
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . :

Ethernet adapter LiveM:

   Connection-specific DNS Suffix  . :
   IPv4 Address. . . . . . . . . . . : 172.22.95.29
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . :

Thanks in advancce

Trying to Understand

$
0
0

We have a shiny new server 2012 R2 install on 2 nodes. We have a JBOD device that we have setup on one of these nodes. We created a storage pool, virtual drive and volume. The drive appears on one of the nodes, fine! We have the cluster role installed on both nodes and configured these two nodes in the cluster. We now want to use this storage pool drive for various things:

  1. general storage
  2. move or VM's over
  3. store SQL instances

The problem we are running into is this, we can't see how to add the existing storage pool to the cluster? The cluster validation says no drives are available. Maybe we are going about this wrong? can anyone explain how to do this?

Windows 2012 R2 Cluster on DMZ network

$
0
0

Dear , We are planning to setup windows 2012 R2 Cluster on DMZ network in order to have high availability of Hyper-V virtual Machine.

My concern are :

Could we setup windows 2012 R2 cluster on DMZ without our Active Directory? if yes let describe it how.

Note: We have windows 2008 R2 Active Directory in Internal network

Regards,

Hussain

NetBIOS disabled but clusterlog full of "Netbios: Slow Operation"

$
0
0

Hello,

I have a question about the cluster log:

Even if I have disabled NetBIOS on every NIC, I can see tousands of Netbios messages in the cluster.log...

0000225c.000014d4::2015/03/10-15:20:51.587 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:e16261d2-9f0c-4ea6-b914-997ff0a0ecba:Netbios
0000225c.000030c8::2015/03/10-15:20:51.587 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:8db455ab-38ba-461f-a492-a8fb3f01a8e1:Netbios
0000225c.00002574::2015/03/10-15:20:51.587 INFO  [RES] Network Name <CTEST1>: Netbios: Slow Operation, FinishWithReply: 0
0000225c.00000788::2015/03/10-15:20:51.587 INFO  [RES] Network Name <Cluster Name>: Netbios: Slow Operation, FinishWithReply: 0
0000225c.00002574::2015/03/10-15:20:51.587 INFO  [RES] Network Name:  [NN] got sync reply: 0
0000225c.00000788::2015/03/10-15:20:51.587 INFO  [RES] Network Name:  [NN] got sync reply: 0
0000225c.00002574::2015/03/10-15:20:51.587 INFO  [RES] Network Name <CTEST1>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle
0000225c.00000788::2015/03/10-15:20:51.587 INFO  [RES] Network Name <Cluster Name>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle

We disabled NetBIOS and LMHOSTS Lookup through the Properties GUI from the network adapters.

We're also using Windows NIC Teaming (lbfoadmin) on these systems.

Is this a normal behavior or do I have to configure some cluster specific parameters? 

Thanks in advance and best regards,

Ville

Server 2012R2 Cluster Storage Error

$
0
0

January 2014 I built 4 servers into a cluster for Hyper-V base VDI using a SAN for central storage. I have had no issues with the running of the setup until recently when the 4th server decided to stop one of the vm host services and VMs became inaccessible. When I was unable to find a solution I rebuild the server and after finding 50+ updates per server, ran windows update on them all. Ever since these 2 simple actions I have been unable to add the server back into the cluster correctly. The Validation wizard shows:

Failure issuing call to Persistent Reservation REGISTER AND IGNORE EXISTING on Test Disk 0 from node when the disk has no existing registration. It is expected to succeed. The requested resource is in use.

Test Disk 0 does not provide Persistent Reservations support for the mechanisms used by failover clusters. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.

The other 3 nodes are using the storage happily without issues. If I force the node into the cluster, it shows it as mounted and accessible but the moment I try to start a VM on that server, it loses the mount point and reports error 2051: [DCM] failed to set mount point source path target path error 85.

The only difference between them is the model of the server - the 3 that work are HP ProLiant DL360 G7 and the one that doesn't is a HP ProLiant DL360p G8. They have worked together previously without issues though.

I am at a complete loss as to what to do. Any help would be gratefully appreciated.

Thanks

CSV Performance

$
0
0

Hi, i have a quick question.

Is it fair to say that VMs living on the node thats hosting the CSV, will perform better - considering that im eliminating the SMB penalty from the equation? I understand that SMB 3 in 2012r2 has been greatly improved, but, the node owning the CSV can read/write directly from/to the source (iscsi/fiber).

In an environment in which the CSV is accessed Via ISCSI. When other nodes (not owning the CSV) are accessing the CSV, does it use the ISCSI network or Data network (to SMB to the host owning the CSV volume)?

Move 2012 R2 Hyper-V Cluster to New Subnet

$
0
0

I have a two-node Hyper-V cluster running on Windows Server 2012 R2. I am moving it to a remote hosting facility. So it is going to be going to a new subnet with different IP addresses than it has now. What do I need to do to make this a smooth transition? Is there an article about it? I haven't found it yet.

Thanks,

Rob


Server 2012 R2 - Cluster Failover - Virtual Lab

$
0
0

I'm attempting to setup a failover cluster in my lab for testing. After several attempts to get this to work, I'm now seeking advice. The moment I create the cluster I get the following behavior:

FIRST:

Event ID 1196

Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason:
DNS request not supported by name server.

Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.

SECOND:

The Disks section of Server manager on both failover nodes reports the following warning:

Incomplete communication with cluster CLUSTER1. The following cluster nodes or clustered roles might be offline or have connectivity issues: CLUSTER1

THIRD:

After the cluster is created, the second node immediately takes ownership of the storage. No errors, but it seems odd that the first node isn't owner by default.

LAB ENVIRONMENT:

Windows 8.1 Desktop running Hyper-V.

  • DC1 - Domain Controller - DHCP/DNS
  • ISCSI - iSCSI target providing the storage
  • SERVER1 - Server 1 in my cluster, accepting storage via iSCSI Initiator
  • SERVER2 - Server 2 in my cluster, accepting storage via iSCSI Initiator

I have configured SERVER1 and SERVER2 with 2 NICS - 1 for regular network traffic and the other for heartbeat. My IP address scheme looks like this:

  • Lab Network: 10.0.1.0/24
  • Heartbeat: 10.10.10.0/24

I can confirm that after the cluster is created the computer object gets created side-by-side with the server objects. In my lab that lives under \Servers\Cluster. However, AD DNS shows no sign of a new record. I've tried manually creating a record, repairing the cluster, but still the 1196 error occurs.

The name 'rolename' is in use by the cluster already as network name or application name

$
0
0

I removed windows cluster ISCSI target server few days back since it was not needed and now its needed as we need since its a clustered storage space and we need this role for ISCSI storage to external server. Now when I add the role again I get this error

The name winiscsi is in use by the cluster already as network name or application name

I double check role is not installed . I even rebooted both nodes


ad

Thin Provisioning Storage Space Virtual Disk on 2012R2 SOFS as CSV

$
0
0

Hi

We have a two node SOFS cluster that is connected to a DataOn JBOD through SAS. 4 X SSD + 2 X 6TB HDDs Tier storage pool created as a 2 way mirror and configured as CSV. we have an additional 4 X 6TB HDDs that we are trying to thin provision and make available to the clusters as a second CSV volume. it seems that the thin provisioning is not possible as there are no option when we try to create the Virtual Disk. Is thin provisioning a virtual disk to be used as a CSV possible on a SOFS ?

Regards

W2012 Failover Clustering Active-Active - Resource Control

$
0
0

Hi,

I have heard others on this forum mention that you can do Active:Active clusters with FC in Windows 2012 ( and probably earlier versions).  I understand that you need to create another cluster ( another logical host ) and you'd need separate shared storage/nics and the rest etc... all good there. 

I also learned that you'd need to ensure that on a 2 node cluster that no one service/app in your logical host could consume more than 50% of the system it runs on. ( NOTE: I am assuming that I am running the same app on each cluster node each pointing to their own piece of shared storage -  so 50%..)

I'd like to better understand what my options are for controlling resources so that ensure that my <Engr> app doesn't run wild and consume all the (CPU/RAM/NET) resources on Node-1 so that the other 50% can be left in case the 2nd active Node-2 fails over its <Same> application onto Node-1..

Are these resource controls wired(or surfaced) into the Microsoft  FC framework somewhere ?  Can anyone elaborate on that or would I for E.g. need to use something like System Center to guarantee this or does WSRM still exists and is it used or ??  IS there a system resource scheduler that can be leveraged ?  Maybe use of processor sets or ??

I am just looking some guidance here - because from what I see as long as the <Engr> App is *very* well behaved and isn't "wild" it might be tempting to try this on 2 nodes that are 2x configured..

Thanks for any guidance and/or words or wisdom

BigDaddy68

Live migration and Quick Migration failing

$
0
0

Hopefully someone can help with this mystery

I have a 3 Node 2008R2 Hyper-v cluster connected to NetApp iscsi storage

Cannot live migrate or quick migrate if the underlying cluster shared volume is not owned by the node to which the vm is being migrated. Even after moving storage live migration may fail or the running vm may shut down. The only elegant way to achieve any form of migration is to shut down the vms running on any CSV and migrate that volume and move all the machines to the node now owning the CSV.

Following Event ids are logged:

Event Id 1069 from source FailoverClustering “Cluster resource 'Virtual Machine Configuration <VMName> in clustered service or application <VMName> failed.”

Event Id 1205 from source FailoverCLustering “The Cluster service failed to bring clustered service or application <VMName> completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.”

Event id 4096  from source Hyper-V-Config “The Virtual Machines configuration 3989B536-28A2-48EB-982E-7B92E0235D54 at 'C:\ClusterStorage\Volume9\<VMName>' is no longer accessible: Logon failure: unknown user name or bad password. (0x8007052E)”

Event Id 16300 from source Hyper-v-VMMS “Cannot load a virtual machine configuration: Logon failure: unknown user name or bad password. (0x8007052E) (Virtual machine ID 3989B536-28A2-48EB-982E-7B92E0235D54)”

Event Id 20100 from source Hyper-V-VMMS :The Virtual Machine Management Service failed to register the configuration for the virtual machine '3989B536-28A2-48EB-982E-7B92E0235D54' at 'C:\ClusterStorage\Volume9\<VMName>': Logon failure: unknown user name or bad password. (0x8007052E)”

Event id 21102 from source Hyper-v-High-Availability “Virtual Machine Configuration <VMName> failed to register the virtual machine with the virtual machine management service.”

Event id 21502 from source Hyper-v-High-Availability “Virtual Machine Configuration <VMName> failed to register the virtual machine with the virtual machine management service.”

Cluster configuration passes validity with few minor warnings that I have always had previously and everything worked fine.

If I try to browse cluster storage volumes at c:\clusterstorage\Volumex  from a non-owning node I get:

C:\ClusterStorage\Volumex is not accessible.

Logon failure: unknown user name or bad password.

This behavior to the best of my knowledge has started after the latest set of patches released by MS on March 2015. There are no authentication protocols between the SAN and the Hyper-v cluster nodes. I do not want to rollback all the patches and then do a trial and error process if I can help it.

Googling / Binging for these set of combined symptons has led to absolutely nothing useful. Am willing to post a sanitised cluster log if necessary.

Thanks
Viewing all 5654 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>