HA Admission Control has historically been on of the lesser understood vSphere topics. It’s not intuitive how it works and what it does. As a result it’s left configured with default values in most vSphere environments. But default Admission Control setting are very restrictive and can often cause issues.
In this blog post I want to share the two most common issues with vSphere Admission Control and solutions to these issues.
Issue #1: Not being able to start a VM
Description
Probably the most common issue everyone encounters with Admission Control is when you suddenly cannot power on VMs any more. There are multiple reasons why that might happen, but most likely you’ve just configured a reservation on one of your VMs or deployed a VM from an OVA template with a pre-configured reservation. This has triggered a change in Admission Control slot size and based on the new slot size you no longer have enough slots to satisfy failover requirements.
As a result you get the following alarm in vCenter: “Insufficient vSphere HA failover resources”. And when you try to create and boot a new VM you get: “Insufficient resources to satisfy configured failover level for vSphere HA”.
Cause
So what exactly has happened here. In my example a new VM with 4GHz of CPU and 4GB of RAM was deployed. Admission Control was set to its default “Host Failures Cluster Tolerates” policy. This policy uses slot sizes. Total amount of resources in the cluster is divided by the slot size (4GHz and 4GB in the above case) and then each VM (even if it doesn’t have a reservation) uses at least 1 slot. Once you configure a VM reservation, depending on the number of VMs in your cluster more often than not you get all slots being used straight away. As you can see based on the calculations I have 91 slots in the cluster, which have instantly been used by 165 running VMs.
Solution
You can control the slot size manually and make it much smaller, such as 1GHz and 1GB of RAM. That way you’d have much more slots. The VM from my previous example would use four slots. And all other VMs which have no reservations would use less slots in total, because of a smaller slot size. But this process is manual and prone to error.
The better solution is to use “Percentage of Cluster Resources” policy, which is recommended for most environments. We’ll go over the main differences between the three available Admission Control policies after we discuss the second issue.
Issue #2: Not being able to enter Maintenance Mode
Description
It might be a corner case, but I still see it quite often. It’s when you have two hosts in a cluster (such as ROBO, DR or just a small environment) and try to put one host into maintenance mode.
The first issue you will encounter is that VMs are not automatically vMotion’ed to other hosts using DRS. You have to evacuate VMs manually.
And then once you move all VMs to the other host and put it into maintenance mode, you again can no longer power on VMs and get the same error: “Insufficient resources to satisfy configured failover level for vSphere HA”.
Cause
This happens because disconnected hosts and hosts in maintenance mode are not used in Admission Control calculations. And one host is obviously not enough for failover, because if it fails, there are no other hosts to fail over to.
Solution
If you got caught up in such situation you can temporarily disable Admission Control all together until you finish maintenance. This is the reason why it’s often recommended to have at least 3 hosts in a cluster, but it can not always be justified if you have just a handful of VMs.
Alternatives to Slot Size Admission Control
There are another two Admission Control policies. First is “Specify a Failover Host”, which dedicates a host (or hosts) for failover. Such host acts as a hot standby and can run VMs only in a failover situation. This policy is ideal if you want to reserve failover resources.
And the second is “Percentage of Cluster Resources”. Resources under this policy are reserved based on the percentage of total cluster resources. If you have five hosts in your cluster you can reserve 20% of resources (which is equal to one host) for failover.
This policy uses percentage of cluster resources, instead of slot sizes, and hence doesn’t have the issues of the “Host Failures Cluster Tolerates” policy. There is a gotcha, if you add another five hosts to your cluster, you will need to change reservation to 10%, which is often overlooked.
Conclusion
“Percentage of Cluster Resources” policy is recommended to use in most cases to avoid issues with slot sizes. What is important to understand is that the goal of this policy is just to guarantee that VMs with reservations can be restarted in a host failure scenario.
If a VM has no reservations, then “Percentage of Cluster Resources” policy will use only memory overhead of this VM in its calculations. Which is probably the most confusing part about Admission Control in general. But that’s a topic for the next blog post.
Tags: Admission Control, CPU, ESXi, fail over, failover, HA, high availability, insufficient, memory, policy, redundancy, reservation, slot, vCenter, virtual machine, VM, vmware, vSphere
May 2, 2016 at 11:37 am |
[…] faces vSphere Admission Control. Quite often this moment is not the most pleasant one. In one of my previous posts I talked about some of the common issues that Admission Control may cause and how to avoid them. […]