Hi Experts,
I need your assistance in a urgent issue . Recently I had moved 2 node Microsoft cluster which is hosting SQL DB and some File share / Generic services from ESXi 5.1 to ESXi 5.5 cluster. After the migration nodes came back online and cluster services started and all looked OK, now the SQL Administrators are complaining that Disk failover is not happening and whenever they try a failover cluster goes hung and they have to restart both nodes to bring back the cluster online . I had verified with VMware and there are no changes in configuration and all looks OK from the hypervisor side . Fro the event viewer what I noticed is from the date of migration is the below cluster event error . ( SQLDB resource is disk resource , Direct LUN from VSP hitachi )
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 7/3/2014 12:03:10 AM
Event ID: 1230
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer:
Description:
Cluster resource 'SQLDB' (resource type '', DLL 'clusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1230</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2014-07-03T04:03:10.006439700Z" />
<EventRecordID>88011</EventRecordID>
<Correlation />
<Execution ProcessID="4860" ThreadID="18908" />
<Channel>System</Channel>
<Computer></Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">SQLDB</Data>
<Data Name="ResourceType">
</Data>
<Data Name="ResTypeDll">clusres.dll</Data>
</EventData>
</Event>
The similar migration was carried out on a different datacenter and we are seeing the same error in event viewer and cluster failover is not working correctly . Could you please help on this ?