Hi
I am testing san controller failover. It takes around 2 mins for the second controller to come online after the first has failed.
There are some registry settings that can be configured to increase disk timeout but they don't seem to work when failover clustering is enabled.
I am testing this on a Hyper-V 2012 R2 failover cluster (regular clustered disks and CSVs - the same issue occurs on both)
I have changed the following registry settings
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\disk\TimeoutValue = 240
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\mpio\Parameters\PDORemovePeriod = 240
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e97b-e325-11ce-bfc1-08002be10318}\0003\Parameters\LinkDownTime = 60
But as soon as the second controller comes up the cluster registers a failure of all the clustered disks and restarts the VMs. Just wondering whether the 2nd controller coming online is somehow triggering the clustered disk failure.
I am seeing the following in the event log.
Connection to the target was lost. The initiator will attempt to retry the connection.
\Device\MPIODisk3 is currently in a degraded state. One or more paths have failed, though the process is now complete.
Ownership of cluster disk 'Cluster Disk 1' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.
Thanks
Daniel