Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 5654

2012 R2 Hyper-V cluster nodes hang

$
0
0

Hi,

We have a two node Hyper-V cluster. ~ once a week either one of the cluster nodes hangs during a backup (Backup Exec 2014 Vray edition) causing all VM's to restart to the other node. When a node hangs the console is just black and mouse moving. Ctrl-alt-del does nothing, only option is to reboot the server.
And almost always if I just let the node to boot up it boots up to the same state, black screen only mouse visible.
I have to boot it first to safe mode and then reboot it again to get it up.

Hardware:

2 x IBM x3550 (2 x CPU, 320 GB RAM, addtional cards: 4-port intel net card + 2 port SAS-card ) as Hyper-V nodes
1 x IBM V3700 as SAN-storage, connected to both nodes with redundant SAS-cables.

Software used:
Windows Server 2012 R2 datacenter OS with Hyper-V roles in cluster nodes
Windows failover clustering
4 x 2TB Shared CSV-disks for Virtual-machines
Backup Exec 2014 V-ray edition
SDDDSM driver for V3700

Configuration:
1 network team of two interfaces for VM-traffic only
1 network interface for VM-traffic only for DMZ traffic for selected VM's
1 network team of two interfaces for Cluster traffic only
1 management interface

ODX (Offloaded Data Transfers) is disabled from both nodes as V3700 does not support it.
~30 virtual machines, mostly windows server versions from 2003 to 2012 R2, couple of Ubuntu VMs and four Windows 7 VMs.

We have all the latest Windows updates and HW firmwares installed in our Cluster nodes.
The problem is that the nodes won't generate any kind of dumps when they hang, so we can't pinpoint where the problem is.

Also system logs don't reveal anything that would tell the actual cause of the hang.

For example according to System log one of the nodes hung at 21:17:44:
The previous system shutdown at 9:17:44 PM on ‎12/‎10/‎2014 was unexpected.

From the even viewer I have found following errors, but these are not near the crash time.

17:06:08 
ERROR VSS
Volume Shadow Copy Service error: Unexpected error calling routine IVssAsrWriterBackup::GetAsrMetadata.  hr = 0x80070037, The specified network resource or device is no longer available.


Operation:
   PrepareForBackup event

Context:
   Execution Context: ASR Writer
   Execution Context: Writer
   Writer Class Id: {be000cbe-11fe-4426-9c58-531aa6355fc4}
   Writer Name: ASR Writer
   Writer Instance ID: {d2d37e37-99d1-446d-a840-5390af00616e}

Error-specific details:
   ASR Writer: The specified network resource or device is no longer available. (0x80070037)



17:06:08 
Warning VSS
Volume Shadow Copy Service warning: ASR writer Error 0x80070037.  hr = 0x00000000, The operation completed successfully.


Operation:
   PrepareForBackup event

Context:
   Execution Context: ASR Writer
   Execution Context: Writer
   Writer Class Id: {be000cbe-11fe-4426-9c58-531aa6355fc4}
   Writer Name: ASR Writer
   Writer Instance ID: {d2d37e37-99d1-446d-a840-5390af00616e}

Error-specific details:
   ASR Writer: The specified network resource or device is no longer available. (0x80070037)


We also have these errors showing up in the Event viewer multiple times during backups, but according to info released
by Microsoft these seem to be related to VM's with IDE root-disks:

ERROR: VDS Basic Provider
Unexpected failure. Error code: 48F@01000003

We will need help to find out what is causing these hangs. Anyone have any hints or should I just open a case to Symantec or Microsoft?

Br,
Antti Kiiski



Viewing all articles
Browse latest Browse all 5654

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>