Hi,
After an unknown issue with one of our Hyper-V 4 Node cluster running on Server 2008 R2 SP1 with fibre channel NEC D3-10 SAN Storage all our cluster shared volumes were in redirecting mode and I was unable to get them back online. Only after rebooting all the nodes one by one the disks came back online. Eventlog messages indicated that I had to test my cluster validation. After shutting down all the virtual machines I set all the cluster shared volumes offline and started the complete validation test. The following warnings/errors appeared during the test.
An error occurred while executing the test.
An error occurred retrieving the
disk information for the resource 'VSC2_DATA_H'.
Element not found (Validate Volume Consistency Test)
Cluster disk 4 is a Microsoft MPIO based disk
Cluster disk 4 from node has 4 usable path(s) to storage target
Cluster disk 4 from node has 4 usable path(s) to storage target
Cluster disk 4 is not managed by Microsoft MPIO from node
Cluster disk 4 is not managed by Microsoft MPIO from node (Validate Microsoft MPIO-based disks test)
SCSI page 83h VPD descriptors for cluster disk 4 and 5 match (Validate SCSI device Vital Product Data (VPD) test)
After the test the cluster shared volume was disappeared (the resource is online).
Cluster events that are logged
Cluster physical disk resource 'DATA_H' cannot be brought online because the associated disk could not be found. The expected signature of the disk was '{d6e6a1e0-161e-4fe2-9ca0-998dc89a6f25}'. If the disk was replaced or restored, in the Failover Cluster Manager snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource. (Event 1034)
Cluster disk resource found the disk identifier to be stale. This may be expected if a restore operation was just performed or if this cluster uses replicated storage. The DiskSignature or DiskUniqueIds property for the disk resource has been corrected. (Event 1568)
In disk management the disk is unallocated, unknown, Reserved. When the resource is on one node and i open disk management i get the warning that i have to initialize the disk. I did not do this yet.
Reading from other posts i think that the partition table got corrupted but i have no idea how to get it back. I found the following information but it's not enough for me to go ahead with: Using a tool like TestDisk to rewrite the partition table. then rewriting the uniqueID to the disk brought everything back. But still no explaination as to why we had our "High Availability" Fail Over cluster down for nearly 2 Days. This happened to us twice within the past week.
Anybody that an idea how to solve this? I think my data is still intact.
Thanx for taking the time to read this.
DJITS.