Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 5654

CSV access stopped working on one cluster node

$
0
0
Have a 2008 R2 two node Hyper-V cluster (server core installation) with a HP Lefthand P4500 SAN (attached over iSCSI and MPIO). 

The cluster has worked for years, but since two days, one node doesn't work anymore. All VM's and all CSV's are running / attached to the working node. After rebooting the non working node, evreything seems to work. But when trying to migrate a VM to the non working node, the migration fails, and a lot of Cluster Events cluster 5120 STATUS_CONNECTION_DISCONNECTED(c000020c), "All I/O will temporarily be queued until a path to the volume is reestablished." are logged on the non working node. 

Have done a lot of troubleshooting, without success:
- verified that there is no HW failure.
- verified networking, SMB access (https://support.microsoft.com/en-us/kb/2008795) between the two nodes.
- verified and compared the MPIO / iSCSI configuration on both nodes (mpclaim -v conf.txt)
- Have upgraded all HP drivers on the non working node, and installed all available Windows updates, including the most recent Nov. 2016 cumulative update rollup
- Have verified that all HF according https://buildwindows.wordpress.com/2012/12/04/windows-hangs-when-accessing-a-cluster-shared-volume/ are installed on the non working node.
- Have run all cluster validation tests. There are no errors or warnings. However, the storage is not tested, because we can not take the CSV's offline.

What we haven't done yet is the upgrade of the HP P4500 SAN SW (Version 11.5 is installed, 12.6 is available).

Have the following questions:

1. Does anyone knows a solution to find the cause of the CSV access problems? There is a blog that describes CSV diagnostics, https://blogs.msdn.microsoft.com/clustering/2014/03/13/cluster-shared-volume-diagnostics/, but all these powershell cmdlets are not available for 2008 R2. And https://blogs.technet.microsoft.com/askcore/2010/12/16/troubleshooting-redirected-access-on-a-cluster-shared-volume-csv/ didn't help us.

2. We consider to reinstall the non working cluster node from scratch. Is the assuption correct that we can evict the non working cluster node, because the Quorum is up and running on the working cluster node? That the VM's continue to run even when the "cluster" has only one running node?

Thank you in advance for any help
Franz



Viewing all articles
Browse latest Browse all 5654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>