We currently have a Windows server 2016 Datacenter server failover cluster with two PowerEdge R740 nodes.
The hardware configuration of each node is as follows:
2x Intel (R) Xeon (R) Silver 4116 CPU @ 2.10GHz Model 85 Stepping 4
RAM 196608 MB
Nvidia Tesla M60 Video Card
SAS connection with a PowerVault® 3420 SAN
Video cards are used in Discrete Device Assignment by virtual machines
We encounter a problem of brutal random reboot of nodes without error message in logs other than an event id 41 Kernel-Power "The system has rebooted without cleanly shutting down first".
BugcheckCode 0
BugcheckParameter1 0x0
BugcheckParameter2 0x0
BugcheckParameter3 0x0
BugcheckParameter4 0x0
SleepInProgress 0
PowerButtonTimestamp 0
BootAppStatus 0
Checkpoint 0
ConnectedStandbyInProgress false
SystemSleepTransitionsToOn 0
CsEntryScenarioInstanceId 0
The reboot of the nodes is not simultaneous and occurs in a totally random way.
We have no errors in hardware testing and no explicit events in Open Manage.
Do you have any idea what caused this problem ?
Best Regards