October 27, 2025 - Abhishek Kothari - Cloud & System Solutions Architect

Incident Management: Post-Mortem Culture That Works

Site Reliability EngineeringBy Abhishek Kothari October 27, 2025 Leave a comment

The world of complex distributed systems is inherently unpredictable. Despite our best efforts in design, testing, and deployment, incidents are not a question of “if,” but “when.” For Site Reliability Engineers, Software Engineers, and Architects, the true measure of an organization’s maturity isn’t the absence of incidents, but rather its response to them. This response,…

Observability vs Monitoring: Understanding the Difference

Site Reliability EngineeringBy Abhishek Kothari October 27, 2025 Leave a comment

In the rapidly evolving landscape of distributed systems, microservices, and cloud-native architectures, the terms “observability” and “monitoring” are often used interchangeably, leading to confusion and, more critically, to systems that are difficult to understand and troubleshoot. For Site Reliability Engineers, Software Engineers, and Architects, understanding the nuanced yet fundamental differences between these concepts is not…

Chaos Engineering in Production: A Practical Guide

Site Reliability EngineeringBy Abhishek Kothari October 27, 2025 Leave a comment

We’ve all been there. It’s 3 AM, and the pagers are screaming. A critical service is down, customers are impacted, and the on-call team is scrambling through logs and dashboards, trying to piece together a puzzle in the dark. The postmortem later reveals the cause: a rare, cascading failure triggered by a minor network blip—a…