Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 5654

Storage Spaces Direct / Cluster Virtual Disk goes offline when rebooting a node

$
0
0

Hello

We have several Hyper-converged einvoronments based on HP ProLiant DL360/DL380.
We have 3 Node and 2 Node Clusters, running with Windows 2016 and actual patches, Firmware Updates done, Witness configured.

The following issue occurs with at least one 3 Node and one 2 Node cluster:
When we put one node into maintenance mode (correctly as described in microsoft docs and checked everything is fine) and reboot that node, it can happen, that one of the Cluster Virtual Disks goes offline. It is always the Disk Performance with the SSD only storage in each environment. The issue occurs only sometimes and not always. So sometimes I can reboot the nodes one after the other several times in a row and everything is fine, but sometimes the Disk "Performance" goes offline. I can not bring this disk back online until the rebooted node comes back online. After the node which was down during maintenance is back online the Virtual Disk can be taken online without any issues.

We have created 3 Cluster Virtual Disks & CSV Volumes on these clusters:
1x Volume with only SSD Storage, called Performance
1x Volume with Mixed Storage (SSD, HDD), called Mixed
1x Volume with Capacity Storage (HDD only), called Capacity

Disk Setup for Storage Spaces Direct (per Host):
- P440ar Raid Controller
- 2 x HP 800 GB NVME (803200-B21)
- 2 x HP 1.6 TB 6G SATA SSD (804631-B21)
- 4 x HP 2 TB 12G SAS HDD (765466-B21)
- No spare Disks
- Network Adapter for Storage: HP 10 GBit/s 546FLR-SFP+ (2 storage networks for redundancy)
- 3 Node Cluster Storage Network Switch: HPE FlexFabric 5700 40XG 2QSFP+ (JG896A), 2 Node Cluster directly connected with each other

Cluster Events Log is showing the following errors when the issue occurs:

Error 1069 FailoverClustering
Cluster resource 'Cluster Virtual Disk (Performance)' of type 'Physical Disk' in clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Warning 5120 FailoverClustering
Cluster Shared Volume 'Performance' ('Cluster Virtual Disk (Performance)') has entered a paused state because of 'STATUS_NO_SUCH_DEVICE(c000000e)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Error 5150 FailoverClustering
Cluster physical disk resource 'Cluster Virtual Disk (Performance)' failed.  The Cluster Shared Volume was put in failed state with the following error: 'Failed to get the volume number for \\?\GLOBALROOT\Device\Harddisk10\ClusterPartition2\ (error 2)'

Error 1205 FailoverClustering
The Cluster service failed to bring clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Error 1254 FailoverClustering
Clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

Error 5142 FailoverClustering
Cluster Shared Volume 'Performance' ('Cluster Virtual Disk (Performance)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

Any hints / inputs appreciated. Had someone something similar?

Thanks in advance

Philippe




Viewing all articles
Browse latest Browse all 5654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>