Open-source SRE methodology skills an AI agent can load. Apache-2.0, runnable offline against fixtures, no credentials.

TL;DR: sre-skills is an open-source (Apache-2.0) library of SRE methodology skills an AI agent can load: the decision procedure for working an incident, not just the commands. It runs offline against fixtures checked into the repo with no credentials, so you can read how an agent reasons before trusting it on prod. One of five skills is shipped so far.

An AI agent in the incident channel can already do the mechanical parts of a page. It greps the logs, runs kubectl get pods, pulls the Grafana panel. What it can't do is the part an on-call learns the hard way, paged at 3am: which signal to read first, whether the deploy from twenty minutes ago is the cause or a coincidence, and when to stop digging and wake a human.

That judgment is the actual job, and it doesn't fit in a system prompt.

So we started writing it down as skills an agent can load. sre-skills is an open library of methodology-shaped SRE skills, Apache-2.0 and vendor-neutral. Five of them: investigate a live incident, analyze change impact before a risky apply, hand over on-call, write a postmortem, audit production readiness. None of them need a vendor account, credentials, or even our product. Each one runs end to end against fixtures checked into the repo.

Open-source SRE methodology skills an AI agent can load. Apache-2.0, runnable offline against fixtures, no credentials.

Other newsrooms on this story

Related reading

Building an Autonomous SRE Agent: From Raw Telemetry to Safe, AI-Driven…

SRE AI Agent Safe Failure Implementation

How to teach SRE AI agents to fail safely and earn your team's trust

OpenSRE: Build Your Own AI Incident-Investigation Agent

How AI impacts site reliability engineering

Skills lie. So we run them.