We had a problem adding a third node to our existing cluster with a communication time out. Therefor we choose to update the servers and try with the latest up-to-date fix levels.
When validating the cluster in order to add a third node, we saw in the validation log:
Disk with identifier 6390744f has a Persistent Reservation on it. The disk might be part of some other cluster. Removing the disk from validation set
Disk with identifier ca0db766 has a Persistent Reservation on it. The disk might be part of some other cluster. Removing the disk from validation set
And:
Cluster disk 8 is not managed by Microsoft MPIO from node svr03.domain.local
There are 11 disks, so 2 were excluded from validating and 1 disk failed MPIO which is strange as it is for sure on SVR02, the existing cluster node.
And on every SVR node:
SCSI page 83h VPD descriptors for cluster disk 8 and 10 match
At the end of this test:
Specified argument was out of the range of valid values.
Parameter name: percentage
So it failed the validation test. We checked the cluster event log and saw no errors, some warnings and everything was online. We logged in to the VMs to check the event logs and on one server we were welcomed by a screen saying that a disk needed its MBR record to be set.
When checking the disk in disk management on the node we saw it was unallocated with status reserved. When looking under the Storage resource of the Cluster we can see the disk is online but the volume path is not there.
When looking at the cluster event log we can see:
Event 1568 - Cluster disk resource 'SQLProd_Log' found the disk identifier to be stale. This may be expected if a restore operation was just performed or if this cluster uses replicated storage. The DiskSignature or DiskUniqueIds property for the disk resource has been corrected.
This is a pass tru disk and the disk the VM wanted to set the MBR record on.
We removed the storage resource, the disk, MPIO and SAN volume and exposed a new SAN volume, set MPIO, disk and added the new storage resource and restore the data.
What can cause validating a cluster to create such a potentially disastrous problem?
TIA,
Fred