Whoops, No VMs!!!
Summary
You’ve planned for redundancy, scaling, and failover, but what happens when the cloud itself runs out of space? In this episode, Carl and Brandon untangle capacity (what the provider physically or logically has available in a region or zone) versus quota (the soft limit on what you can consume). Mixing the two leads to painful surprises during scale events and failovers.
We talk through how capacity shortfalls show up in real life—zones that are full, SKUs that vary by location, and limited supply for GPU-heavy instances, and the patterns that help: design for multiple zones and regions, add retry and fallback logic with flexible SKUs, balance spot with on-demand, and hold a baseline with reservations or time-bound commitments.
We close on the business side: the price of headroom, when commitments make sense, and simple pipeline and monitoring checks so “no capacity” errors fail fast instead of 30 minutes into a deploy.
Links
- AWS Auto Scaling allocation strategies
- AWS EC2 Capacity Reservations
- AWS insufficient capacity guidance
- AWS Savings Plans
- AWS Service Quotas
- Azure On-demand Capacity Reservations
- Azure quotas overview
- Azure region pairs
- Azure subscription and service limits
- Azure VM allocation failures
- Azure VM Scale Sets orchestration modes (Flexible)
- GCP Compute Engine Reservations
- GCP quota alerts and monitoring
- GCP Regional Managed Instance Groups
- GCP resource availability errors
- Google Cloud quotas overview