I am running a HyperV cluster on 2 nodes, which utilize a HA-NAS-Cluster to store Virtualmachines and their disks.
The NAS-Cluster itself has build in failover features which works nice, but failover can take upto 3 minutes to complete.
During this time, CSVs are reported offline and running VMs are starting to "fail", entering an undetermined state.
- Some just report "failed",
- Some report "failed", but start again, ending up on "missing or invalid boot drive"
- Some - especially linux vms - end up reporting disk errors, shutting down everything until a manual reboot.
I read, that hyperV will queue Disk-IO in Case a CSV goes down for a certain amount of time. It looks like VMs are staying healthy for 30 seconds, then starting to fail.
Is there a way to extend this time to - lets say - 5 Minutes?
What about Read-IO happening meanwhile?
best,
dognose