We're trying to implement Cluster-Aware updating but we keep running into issues where virtual machine migrations fail to migrate.
Our cluster(s) have plenty of memory allowing frequently for 2 nodes of a 6 node cluster to be completely devoid of roles. We've kicked off CAU, it patches and reboots the nodes with no roles and then moves onto the others. While attempting to drain one of the remaining nodes, it will kick off live migrations (no low priority roles). Since our max migration value is 2, we will continually get 21501 warnings as it works through the list. Towards the end, and only occasionally, the last few will fail with a 21502 due to not enough memory. This then hangs the drain until manual intervention.
21501
Live migration of 'SCVMM BRMWD-SPDEV02' failed.
Virtual machine migration operation for 'BRMWD-SPDEV02' failed at migration destination 'BRMWD-HYPV02'. (Virtual machine ID 2A4EC899-079C-4355-A503-F097FAF33E2B)
Failed to perform migration on virtual machine 'BRMWD-SPDEV02' because virtual machine migration limit '2' was reached, please wait for completion of an ongoing migration operation. (Virtual machine ID 2A4EC899-079C-4355-A503-F097FAF33E2B)
21502
Live migration of 'Virtual Machine BRMWT-FE01' failed.
Virtual machine migration operation for 'BRMWT-FE01' failed at migration destination 'BRMWD-HYPV02'. (Virtual machine ID 385026E5-7B2F-46EA-ADFE-EF854F76A4FE)
'BRMWT-FE01' could not initialize. (Virtual machine ID 385026E5-7B2F-46EA-ADFE-EF854F76A4FE)
Not enough memory in the system to start the virtual machine BRMWT-FE01 with ram size 2048 megabytes. (Virtual machine ID 385026E5-7B2F-46EA-ADFE-EF854F76A4FE)
I know we could likely just increase the number of live migrations to get around this or even assigning all VMs to preferred owners to keep the cluster more balanced. This is unfounded but it seems like when a CAU drain is initiated it is picking a static host to move all VMs to rather than using the best possible node on each migration.
Can someone confirm for me if this is accurate or if there is any way of changing this?