We are seeing a problem in prod, but not in any lower environments. Here's what happened- we put the cluster(normally in synchronous repl) into async, Making sure all roles, plus the cluster, are on the Primary, we paused the secondary first, and then
took the secondary offline. Seems fine. When we booted back up, had to do an update so rebooted once more. The cluster freaked out, said that the Primary didn't have the most recent copy of the cluster config, kicked it out, also stated that the FSW failed
to arbitrate, yet that box is up and running and I had an open ping from it to the cluster just to be sure.
↧