Scale Out File Server SMB redirection locking up CSVs

Problem - Physical hosts have HyperV running and a vhdx located in a SOFS CSV (HyperV hosts different than SOFS cluster nodes). During start up of the VM when SMB redirection occurs or when trying to move CSVs with an active SMB connection between cluster nodes locks up the CSV.

All physical hosts and VMs are Windows 2012 R2 with updates to ~July 2016
All physical hosts are Cisco C220s with latest OS updates and 1 update behind on firmware
SOFS is a two physical node cluster with SAS connected JBOD
4 CSVs exist, all exhibiting the same issue
SOFS cluster nodes have the below networks:
Mgmt - teamed 10G - no cluster use
cluster0 - single 10G nic - cluster only
cluster1 - single 10G nic - cluster only
SOFS0 - single 10G nic - cluster/client
SOFS1 - single 10G nic - cluster/client (currently set to none for troubleshooting)
Backup - Teamed 10G - no cluster use
LiveMigration - Teamed 10G no cluster use/only network for live migrations
Cluster validation runs clean
When nothing is connected to the CSV shares I can fail CSVs and SOFS role without any errors
Currently each CSV is used by a single HyperV server and has a single vhdx in it.

HyperV host networks
SOFS0 - single 10g nic
SOFS1 - single 10g nic
Backup Team
Mgmt Team
Customer Network Team

I believe both problems are related;
Problem 1)
CSV share is owned by SOFSA
When I boot a VM with a secondary vhdx located in SOFS (OS is in local RAID disk), checking the SMBClient logs on HyperV host and SMBServer logs on SOFS hosts I can see:
HyperV host hits SOFSB.
HyperV host connects and share is seen as asymmetric/continuous availability transfer. Witness registration completes.
SOFSB issues redirect to SOFSA.
HyperV host gets redirection request and establishes connection to SOFSA (4 event log messages, SMB client reconnect, session reconnect, share reconnect and witness registration).
At the same second as the previous 4 SMB reconnect messages, but last in sequence. so the 5th message, a message is received to redirect to another cluster node.
HyperV looses session and share during reconnect and SMB Client successfully moved, but no messages on session or share reconnect.
After 59 seconds on the SOFSA I have errors the re-open failed (event id 1016), client session expired
After 60 seconds HyperV registers a request timeout due to no response from server. Server is responding to TCP but not SMB (event id 30809)
HyperV host then immediately registers a connections to SOFSB for the share, goes through the same redirection sequence to SOFSA (who owns the share). SMB Client, session reconnect, share reconnect, witness registration successful.
2 seconds later on SOFSA I have a reopened failed, the file is temporarily unavailable (event ID 1016) I can see the source/destination/share that matches with what is occurring. Error just continues every 5 seconds.
If I go and try to 'inspect' the drive from HyperV it times out and on SOFSA I get a warning (event ID 30805) client lost its session - Error {Network Name Not Found} - The specified share name can not be found share name \SOFSClusterName\$IPC
Now we just repeat errors client established session to server, lost session to server network name not found server \SOFSClusterName - same session ID in connect/disconnect for each pair of connect/disconnect

Now the great part -
If I go into failover cluster (FOC) and I try to move the CSV to the other node, the CSV gets stuck in pending offilne. After a few minutes any other CSVs owned by the same node go into pending offline and hang. I can reboot and wait 10 minutes for it to finally die and failover or wait 20 for FOC to completely die on both nodes of the cluster. In the cluster logs, the SOFS node is never fully releasing the CSV to move. The last message you will see related to teh volume is:
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} transitioning from 4 to 2.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} moved to state 2. Reson 7; Status 0x0.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} transitioning from 2 to 1.

Normally you see :
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} transitioning from 4 to 2.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} moved to state 2. Reson 7; Status 0x0.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} transitioning from 2 to 1.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} moved to state 1. Reson 5; Status 0x0.
Volume4; Volume target path \??\GLOBALROOT\Device\Harddisk39\ClusterPartition1; File System target path \??\GLOBALROOT\Device\Harddisk39\ClusterPartition1.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} transitioning from 1 to SetDownlevel. Local true; Flags 0x1; CountersName
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} moved to state 3. Reson 3; Status 0x0.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} transitioning from 3 to 4.
Volume {c7cdc2d5-e1f9-40c5-b36d-43523e2996f1} moved to state 4. Reson 4; Status 0x0.

Issue is consistent across all 4 CSVs I have. I believe the issue has always existed. If I get the HyperV hosts lined up right to initially hit the SOFS server that owns the CSV, everything boots up fine. When it doesn't VMs and FOC hangs and I have to go through reboots and VMs loose their drives and I have to reboot those as well. It only when it gets redirected to a different SOFS server that the issue comes up which leads me to the next problem.

Problem2:
Assuming all the VMs connected to the right SOFS CSV owner on boot and everyone is running/working fine for days/weeks/months (yes this has been sitting around for a while as unresolved problem). If I try and move a CSV for SOFS maintenance purposes the CSV hangs in offline pending. Eventually the FOC hangs and I have to spend 2 hours to get things lined up right (after I do what ever I was planning on doing) so the VMs boot.

Things done/verified
Windows firewall is off
I've turned off IPv6
Removed Teaming from all nodes using SOFS0/1 network and cluster0/1 (used to be windows team vs individual networks)
Turned off client/network access from SOFS1 network
turned off CSV balancer - hindsight doesn't work without it due to redirection of CSVs due to asymentic storage
updated permissions for SOFS share to include HyperV host, SOFS cluster nodes - didn't make any difference/never see access denied errors

One item I see I don't understand is on the SOFS cluster nodes, in SMBClient/connectivity logs, I see network connection failed to the cluster adddresses:

The network connection failed.
Error: {Device Timeout}
The specified I/O operation on %hs was not completed before the time-out period expired.
Server name: fe80::98f9:c138:xxxxx%32
Server address: x.x.x.x:445
Connection type: Wsk
Guidance:
This indicates a problem with the underlying network or transport, such as with TCP/IP, and not with SMB. A firewall that blocks port 445 or 5445 can also cause this issue.

The server name is the 'Tunnel adapter Local Area Connection* 12:' on the other SOSF cluster node. So SOFSA generating errors to SOFSB and SOFSB generating errors connecting to SOFSA. This was occuring before and after the cluster0/1 network interfaces were teamed

Thanks-

Scale Out File Server SMB redirection locking up CSVs

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112