Back to Articles
Key findings: ITBench-AA SRE overview: Highlights ITBench-AA is built in partnership with @IBMResearch based on their ITBench benchmark. Artificial Analysis and IBM Research are launching ITBench-AA, the first in a new series of benchmarks evaluating models on agentic enterprise IT tasks, starting with Site Reliability Engineering tasks where frontier models score below 50%
ITBench-AA’s SRE tasks benchmark model performance on Kubernetes incident response, where models and agents must diagnose live systems by reading logs, tracing dependencies, and identifying root-cause entities across complex infrastructure. The underlying ITBench dataset has been developed by IBM Research, leveraging IBM’s deep expertise in enterprise IT operations.
Artificial Analysis has worked closely with IBM over the last 6 months to develop an implementation of the dataset for frontier AI evaluation, beginning with Site Reliability Engineering (SRE) and expanding to Financial Operations (FinOps) and Chief Information Security Officer (CISO) tasks over time.
Key findings:










