CloudChat logo
#0029

New Year's ☁️ Resolutions

Published on

Summary

“In 2026, your cloud is not allowed to have the same incidents for the same reasons as last year.” Carl and Brandon treat this episode like a retrospective (the kind any good agile team would run), but instead of talking about sprint tickets, they write a New Year’s resolution list on behalf of your cloud team. The format is simple: Stop, Start, Keep. Small, opinionated constraints that change day-to-day habits, not vague wishes about “better reliability, security, and cost.”

s Resolutions

The Stop list hits the repeat-incident patterns: single-region “global” apps, treating infrastructure-as-code as optional (and living in the portal), mystery ownership with no clear tags or escalation path, one-off production fix scripts that never get documented, dashboards that are always green while users are hurting, and “temporary” exceptions that turn into permanent risk.

The Start list is the muscle-building: run realistic failover/incident drills, measure change and recovery (DORA-style signals and MTTR, not just uptime), budget reliability and cost together, treat internal platforms like products with golden paths, standardize secrets and identity, and add a regular “delete day” so old environments and artifacts do not drag into the new year.

The Keep list is what compounds: automate repetitive toil, invest in observability tied to real user flows, keep blameless postmortems with concrete follow-ups, and keep platform/SRE work visible so it does not get squeezed out by features.

We hope you and your team are able to embrace some of these resolutions in the coming year, and hope that listening to more CloudChat is at the top of your list. Happy New Year everybody!


Recent Episodes

Published on

DNS still runs the internet, but we keep asking it to do things it was never built for. In this episode, we talk about why DNS becomes a single point of failure in modern cloud apps, how real outages play out, and what you can do to actually steer traffic when the edge breaks.

Published on

Capacity and quota aren’t the same. We break down the difference, why it matters when you scale or fail over, and practical ways to plan and mitigate when capacity runs tight across Azure, AWS, and Google Cloud.

Published on

Cloud cost optimization is a continuous process that balances performance, scalability, and financial efficiency. Each provider offers mature tooling for rightsizing, automation, governance, and cultural alignment with FinOps practices, but the real challenge is turning insight into sustained action.