While provisioning new resources into Canada Azure East recently, we ran head first into a roadblock. We were setting up net new subscriptions for a customer, and discovered that Azure Canada East was not enabled as a region for the subscription. After resolving that, we found that the VM sizes we needed were not enabled, and after they were enabled, we discovered that they had a quota of zero.
This article will be of interest to anyone planning on net new, or rapid ramping of existing resources. We’ll discuss at a high level what the issue is, and why planning and testing for it is of particular importance where automatic deployment tools are in use, including Disaster Recovery tools. You may wind up needing to choose a different region.
The public cloud is not infinite. It is made up of physical data centres. When they start to reach capacity, the cloud vendors start micro managing growth while net new infrastructure catches up to demand. Every cloud vendor does this, but in different ways.
In this case the project was small, and Microsoft was very cooperative. They asked for the VM sizes and quantities we were trying to provision, and based on the core count we needed, decided there was room for it in Canada Azure East. They enabled the region, then via a different process, enabled the specific VM’s we wanted, and then, via yet another process, raised our quota on those VM’s to quantities we could work with. But the process took several days, which impacted deadlines. Had the project been very large, Microsoft very well may not have been able to help us at that point in time. For customers with data residency issues, this is more than just a geographical issue. We only have two choices in Canada, the second being Azure Canada Central. The problem being that prices for the specific resources we needed were, in Canada Central, anywhere from 10% to 20% higher, an obvious impact to budget.
Now the important part. It is about much more than just cost. Consider any tools that automatically spin up resources on demand. For example, many customers are starting to leverage the public cloud as a Disaster Recovery target. The ROI is fantastic. You don’t have to pay for all those compute resources unless you actually have a disaster. But what if on the day your disaster happens, your quota is too small to accommodate the resources your DR tools are auto provisioning? For this or any other deployment tool, the processes will fail with no obvious indication why.
It is critical to understand Resource Planning in the cloud, and quotas in particular. Before launching a project, you need to ensure that the region of choice is enabled for your subscription, that the specific resources are also available and enabled, what your quota for those resources is, and what the process is for getting them increased if needed. In the case of tools that provision resources like the Disaster Recovery example above, best practices are now starting to include a regular Resource Management process right down to the quota level. That is, testing for resource availability on a regular basis, and obtaining sufficient quota to meet current and future project requirements.
All cloud vendors manage quotas, but in different ways. Suddenly try and spin up 6,000 cores in AWS for example, you’ll most likely be talking to AWS in a hurry. Not just because they may not have that many available, but as a circuit breaker in case that’s a typo, you meant only 600.
In the case of Azure, the process is relatively complex. First of all, Microsoft gives you not one, but four different ways of discovering how much quota you have for any particular resource. Unfortunately, these tools don’t agree with one another last time we tested, so we regard them as a rough guideline only. To know for certain, you’ll have to go through a test provisioning process for each specific resource.
The good news is that if you are planning something like a Disaster Recovery site in Azure, you can drill into those specific resources and request sufficient quota from Microsoft. They’ve advised us that once granted, they don’t claw back quotas even if they aren’t being used, so that should give your Disaster Recovery plan a degree of assurance it will be able to execute. That said, checking once a quarter to ensure quota is sufficient would not be unwise.
For more information on Resource Management, quotas, and how to manage them in AWS or Azure, please feel free to reach out to us.