Operating Excellently
Summary
Operational excellence goes beyond uptime, it’s about building and operating cloud systems with discipline, automation, and continuous improvement. Carl and Brandon break down what operational excellence really means, drawing a distinction between striving for perfection and building resilient, adaptable systems. They discuss how principles from AWS, Azure, and GCP converge around key practices like repeatable automation, structured change management, and process validation.
The episode dives into real-world strategies for automation, incident readiness, and observability, including where and how to insert gates, use feature flags, and integrate infrastructure as code across cloud platforms. From avoiding certificate-induced outages to catching misconfigurations early, the key theme is consistency at scale. The discussion also emphasizes the cultural side, why shared ownership, retrospectives, and iterative postmortems matter just as much as tooling.
Links
- Ansible: Ansible community documentation
- AWS Docs: Amazon CloudWatch documentation overview
- AWS Docs: Operational Excellence whitepaper
- AWS Docs: Prescriptive Guidance: Operational Excellence
- AWS Docs: Using CloudWatch dashboards and alarms
- AWS Docs: Well‑Architected Framework – Operational Excellence pillar
- AWS: Getting started with Amazon CloudWatch
- Google Cloud: Continuously improve and innovate
- Google Cloud: Manage incidents and problems
- Google Cloud: Operational Excellence pillar overview
- Google Cloud: Operational readiness & performance using CloudOps
- HashiCorp Docs: Terraform configuration language reference
- HashiCorp Docs: Terraform documentation
- Microsoft Docs: Automation of tasks with PowerShell in Power Platform
- Microsoft Learn: Azure Automation documentation
- Microsoft Learn: Azure Monitor documentation
- Microsoft Learn: Operational Excellence maturity model
- Microsoft Learn: Operational Excellence overview & quickstart
- Microsoft Learn: Operational Excellence principles (maturity model, practices)
- Microsoft Learn: PowerShell documentation
- PowerShell Universal Docs: PowerShell Universal platform guide
- Red Hat Docs: Ansible Automation Platform guide