Hi,
This one has had me confused for some time.
We have a clustered pair of file servers, with an additional server at our DR site. Certain shares are replicated to the DR site using DFS.
This replication fell over several months ago. We worked around it with a scheduled xcopy, but we want to figure out the root cause now we have time to work on it.
When I run a diagnostic report, the result is always that one server will report that it cannot see its DFS partner:
DFS Replication cannot replicate with partner <CAP name> due to a communication error. The DFS Replication service used partner DNS name <CAP FQDN>, IP address <CAP IP>, and WINS address <CAP name> but failed with error ID: 1727 (The remote procedure call failed and did not execute.). Event ID: 5002
Weird thing is, it's not consistently the same server, rather it's the one with the highest uptime. If I restart a cluster node and put the Cluster Access Point there, then the DR server can't resolve the CAP. If I then bounce the DR server, it will see the CAP, but the CAP no longer sees it. Next time I get the opportunity, I'll reboot the DR server, then a cluster node, then move the role back and forth between nodes to see if the behaviour is consistent.
Computer management also fails in the same direction. All other servers can see each other, and the DR server can see all the other CAPs. Additionally, computer management works via IP address, only name resolution fails, and only for that one name.
Could there be something wrong with the CAP? Can anyone suggest where to look next?
Kind Regards,
Em.