Hi,
Here is the system background information:
VMWare ESX 5.5 Hosts
Shared iSCSI LUNs on HP StoreVirtual P4000 SAN, presented to ESX hosts and then to Guest VMs as Raw Device Mappings
2 Node SQL 2014 Failover Cluster on Windows Server 2012 (Non-R2)
We began the system build by presenting 2 brand new LUNs to the 2012 Guests, we'll call them VOL_Q (Quorum/Disk Witness) and VOL_S (For SQL DB files). We successfully installed SQL on both nodes (I'll refer to them as NODE01 and NODE02 from here) and
tested failover of the core resources as well as the new SQL Server Resource, and the testing proved successful in that all storage volumes and SQL server failed over from NODE01 to NODE02 successfully.
Our next step was to migrate the old volumes from our existing SQL Cluster to the new Cluster, bring the volumes in and add them to the new SQL Server Resource, and test failover from NODE01 to NODE02. This is where the problem began; after adding the first pre-existing volume (VOL_P) to the new 2012 Cluster successfully, a failover was attempted, and ONLY VOL_Q (The Quorum/Disk Witness) volume would come Online successfully. Both of the other volumes, VOL_P, AND the volume that previously came online successfully, VOL_S, stayed in 'Online Pending' status.
I did not have ample time to allow the resources to go to a FAILED state before having to bring the system online again, so after about 2 minutes I forced the fail back from NODE02 to NODE01, in which case all of the resources came Online successfully as expected.
Here are the entries from the cluster log of NODE02:
00000f68.000015f8::2014/12/16-22:15:29.819 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskOnlineV2: Online request.
00000f68.000012d4::2014/12/16-22:15:30.048 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskArbitrateInternal request Not a Space: Uses FastPath
00000f68.000012d4::2014/12/16-22:15:30.048 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskArbitrateInternal: Clusdisk driver handle or event handle is NULL.
00000f68.000012d4::2014/12/16-22:15:30.059 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='\\?\scsi#disk&ven_lefthand&prod_iscsidisk#5&de2103a&0&000100#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
00000f68.000012d4::2014/12/16-22:15:30.091 INFO [RES] Physical Disk <(S:) MSSQL>: Arbitrate - Node using PR key 348c08690001734d
00000f68.000012d4::2014/12/16-22:15:30.521 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpPRArbitrate: Fast Path arbitration...
00000f68.000012d4::2014/12/16-22:15:30.816 INFO [RES] Physical Disk <(S:) MSSQL>: Successful reserve, key 348c08690001734d
00000f68.000012d4::2014/12/16-22:15:30.817 INFO [RES] Physical Disk <(S:) MSSQL>: Disk is offline
00000f68.000012d4::2014/12/16-22:15:30.818 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpSetUnsetDiskFlags(mask=0x00000007, SetCluster=1, SetCsv=0, SetMaintenanceMode=0, Notify=1, Update=1) for device=2
00000f68.000012d4::2014/12/16-22:15:30.818 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpGetDiskHandle: EXIT, status 0
00000f68.000012d4::2014/12/16-22:15:30.818 WARN [RES] Physical Disk <(S:) MSSQL>: HardDiskpUpdateVolumePropertiesInDriver: Disk is offline.
00000f68.000012d4::2014/12/16-22:15:30.818 INFO [RES] Physical Disk <(S:) MSSQL>: OnlineThread: Successfully cleared CSV state with partmgr.
00000f68.000012d4::2014/12/16-22:15:30.826 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: Begin wait for \\?\GLOBALROOT\Device\Harddisk2\Partition0 partitions to arrive
00000f68.000012d4::2014/12/16-22:15:30.827 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: Status ERROR_IO_PENDING from IOCTL_DISK_ARE_VOLUMES_READY
00000f68.000012d4::2014/12/16-22:15:31.193 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: Wait success and IOCTL_DISK_ARE_VOLUMES_READY completed with status=0
00000f68.000012d4::2014/12/16-22:15:31.193 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: wait for volumes
00000f68.00001608::2014/12/16-22:18:33.880 INFO [RES] Physical Disk <(S:) MSSQL>: Terminate request.
You will notice the last line where I forced the failback from NODE02 to NODE01. In contrast, here is the entry from NODE01 where it successfully fails back:
00001044.000017c8::2014/12/16-22:19:10.170 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskOnlineV2: Online request.
00001044.00000d8c::2014/12/16-22:19:10.186 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskArbitrateInternal request Not a Space: Uses FastPath
00001044.00000d8c::2014/12/16-22:19:10.186 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskArbitrateInternal: Clusdisk driver handle or event handle is NULL.
00001044.00000d8c::2014/12/16-22:19:10.186 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='\\?\scsi#disk&ven_lefthand&prod_iscsidisk#5&de2103a&0&000100#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
00001044.00000d8c::2014/12/16-22:19:10.186 INFO [RES] Physical Disk <(S:) MSSQL>: Arbitrate - Node using PR key 4f2847ef0002734d
00001044.00000d8c::2014/12/16-22:19:10.794 INFO [RES] Physical Disk <(S:) MSSQL>: Successful reserve no need to arbitrate, key 4f2847ef0002734d
00001044.00000d8c::2014/12/16-22:19:10.794 INFO [RES] Physical Disk <(S:) MSSQL>: Disk is offline
00001044.00000d8c::2014/12/16-22:19:10.794 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpSetUnsetDiskFlags(mask=0x00000007, SetCluster=1, SetCsv=0, SetMaintenanceMode=0, Notify=1, Update=1) for device=2
00001044.00000d8c::2014/12/16-22:19:10.794 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpGetDiskHandle: EXIT, status 0
00001044.00000d8c::2014/12/16-22:19:10.794 WARN [RES] Physical Disk <(S:) MSSQL>: HardDiskpUpdateVolumePropertiesInDriver: Disk is offline.
00001044.00000d8c::2014/12/16-22:19:10.794 INFO [RES] Physical Disk <(S:) MSSQL>: OnlineThread: Successfully cleared CSV state with partmgr.
00001044.00000d8c::2014/12/16-22:19:10.794 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: Begin wait for \\?\GLOBALROOT\Device\Harddisk2\Partition0 partitions to arrive
00001044.00000d8c::2014/12/16-22:19:10.794 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: Status ERROR_IO_PENDING from IOCTL_DISK_ARE_VOLUMES_READY
00001044.00000d8c::2014/12/16-22:19:10.810 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: Wait success and IOCTL_DISK_ARE_VOLUMES_READY completed with status=0
00001044.00000d8c::2014/12/16-22:19:10.810 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: wait for volumes
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpWaitForPartitionsToArrive: wait for volumes completed
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: UnLockVolumesIfEncryptionEnabled
00001044.00000d8c::2014/12/16-22:19:11.122 WARN [RES] Physical Disk <(S:) MSSQL>: FVELIB load failed 7e FveLoaded 0
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskVolumeGuidPathnameAndDriveLetterChecks. Disk {2}.
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskGetWin32Pathnames: Found 2 mount points for disk {2}, partition {1}.
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpDriveLetterAndVolumeGuidReset: Volume {\\?\Volume{4a14bbe2-7995-11e4-9403-005056b846cb}\}, Current Drive Letter {0x20}, Cluster Drive Letter {0x20},
Point {000000530ED16F00}
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: OnlineThread: HardDiskpVerifyVolume part=1 returned 0, state=0x00000000
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: ResHardDiskVerifyMountFolderTargetVolumesAreClustered: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpValidateMountpoints - Mount point for \\?\Volume{4a14bbe2-7995-11e4-9403-005056b846cb}\ returned invalid handle
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: HardDiskpValidateMountpoints - There are no mounted folders.
00001044.00000d8c::2014/12/16-22:19:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: MountPoint S:\ points to volume \\?\Volume{4a14bbe2-7995-11e4-9403-005056b846cb}\
00001044.000017f4::2014/12/16-22:20:11.122 INFO [RES] Physical Disk <(S:) MSSQL>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS
One other thing to note, is that on NODE02 during the failover, after the cluster service successfully reserves the disk, I see entries like this in the System event log, from the Ntfs source:
"Volume S: (\Device\HarddiskVolume2) is healthy. No action is required."
So to me it looks like the OS sees the volume, yet it does not show any information in Windows Explorer or otherwise, and if you try to do anything such as run diskpart or Disk Management to view the volumes, those operations just freeze up.
Any help would be greatly appreciated, as I'm not sure at this point where to turn for help determining where the problem is; VMWare, Microsoft, HP, All of the Above? We have another non SQL cluster that we just configured the exact same way, using old
volumes and all, that performs the way it should, so I'm really confused as to what is going on with this specific situation i'm in now. I can provide logs of any of the involved systems.
Thank you for your time,
Chad