#0024

Operating Excellently

Published on 2025-08-04

Summary

Operational excellence goes beyond uptime, it’s about building and operating cloud systems with discipline, automation, and continuous improvement. Carl and Brandon break down what operational excellence really means, drawing a distinction between striving for perfection and building resilient, adaptable systems. They discuss how principles from AWS, Azure, and GCP converge around key practices like repeatable automation, structured change management, and process validation.

The episode dives into real-world strategies for automation, incident readiness, and observability, including where and how to insert gates, use feature flags, and integrate infrastructure as code across cloud platforms. From avoiding certificate-induced outages to catching misconfigurations early, the key theme is consistency at scale. The discussion also emphasizes the cultural side, why shared ownership, retrospectives, and iterative postmortems matter just as much as tooling.

Links

Ansible: Ansible community documentation
AWS Docs: Amazon CloudWatch documentation overview
AWS Docs: Operational Excellence whitepaper
AWS Docs: Prescriptive Guidance: Operational Excellence
AWS Docs: Using CloudWatch dashboards and alarms
AWS Docs: Well‑Architected Framework – Operational Excellence pillar
AWS: Getting started with Amazon CloudWatch
Google Cloud: Continuously improve and innovate
Google Cloud: Manage incidents and problems
Google Cloud: Operational Excellence pillar overview
Google Cloud: Operational readiness & performance using CloudOps
HashiCorp Docs: Terraform configuration language reference
HashiCorp Docs: Terraform documentation
Microsoft Docs: Automation of tasks with PowerShell in Power Platform
Microsoft Learn: Azure Automation documentation
Microsoft Learn: Azure Monitor documentation
Microsoft Learn: Operational Excellence maturity model
Microsoft Learn: Operational Excellence overview & quickstart
Microsoft Learn: Operational Excellence principles (maturity model, practices)
Microsoft Learn: PowerShell documentation
PowerShell Universal Docs: PowerShell Universal platform guide
Red Hat Docs: Ansible Automation Platform guide

Permalink

Recent Episodes

What is Cloud Resiliency, Really? (2025-06-02) : Carl and Brandon take a grounded look at what cloud resiliency really means — and how it compares to availability, reliability, and redundancy. They unpack strategies for designing systems that recover gracefully from failure, using real-world examples and architectural patterns that keep your cloud stack steady when it matters most.

The 9 Circles of Dependency Hell 🔥 (2025-05-05) : Carl and Brandon explore the "9 Circles of Dependency Hell," breaking down the most common pitfalls developers face when managing dependencies in cloud environments — and how to escape them. From version conflicts to licensing issues, it’s a survival guide for modern cloud teams.

The 3 M's of Going to the Cloud (2025-04-07) : Gain insights on cloud migration, modernization, and management as Carl and Brandon break down the essentials of planning, evaluating on-prem environments, choosing providers, and preparing for Day 2 Operations, backed by real-world experiences to guide your team's journey.

All Your Data Are Belong to Us (2025-03-03) : Carl and Brandon talk all things data storage in the cloud! With so many options to choose from, how do you know you picked the right one?

We Can Hardly Contain Ourselves! (2025-02-03) : Carl and Brandon deep dive into container technology, covering career paths, best practices, runtime, orchestration, optimization, and security.

The Source is with Us (2025-01-06) : A deep dive into the open-source journey of Brian Munzenmayer, discussing community engagement, project maintenance, and the future of open-source software.

Control All the Things! 🛩️ (2024-12-02) : Carl and Brandon dive into the world of planes… control planes, that is! What is a control plane, why would you want to build one, and what are common examples that you've already used? Learn in this episode of CloudChat!

Dude, Where's My Server? (2024-11-04) : Carl and Brandon talk all things serverless computing and what the Big Three Clouds offer, real-world use cases, and future trends.