I built an autonomous SRE that lets an LLM diagnose incidents — but never touch a shell unsupervised

I built an autonomous SRE system where a local LLM diagnoses production incidents, proposes a fix, and a deterministic engine decides whether that fix is ever allowed to run. The whole thing works inside your own network — zero data egress.

It's called Sentinel. It's a prototype and a learning project: I started from scratch on a Mac and used it to go deep on distributed systems, gRPC, local LLM inference, and safe automation. This post walks through why it's built the way it is — the design decisions are the interesting part.

Repo: https://github.com/Blazi2002/sentinel

The problem

Modern observability is passive. Prometheus and Grafana tell you that something is wrong, but a human still has to diagnose the cause and type the fix — often at 3 a.m.

I built an autonomous SRE that lets an LLM diagnose incidents — but never touch a shell unsupervised

Related reading

Building an Autonomous SRE Agent: From Raw Telemetry to Safe, AI-Driven…

Auto-verifying your AI-SRE's fixes against your real cluster, with mirrord

Making a fleet of self-hosted LLM agents trustworthy

Building an AI SRE That Learns From Every Outage: Inside Nexus Sentinel

Building with Local LLMs: An Engineer's Approach to AI-Assisted Development

IncidentOS AI — We Built a Self-Learning SRE Brain at HackBaroda 2026