The Argument
The AI safety conversation is dominated by two camps: the alignment researchers thinking about existential risk, and the product engineers shipping features. Neither group talks enough about the middle layer — the actual enforcement mechanism that determines whether an agent behaves as intended in production.
Here's my thesis: evaluation infrastructure is alignment enforcement. Not alignment research. Not safety theater. The actual enforcement layer that determines whether your agent does what you intended, at runtime, under adversarial conditions.
If your agent can be jailbroken, produce harmful outputs, or violate its constraints — and you only find out from user reports — your evals aren't a testing tool. They're a missing safety system.
Safety Isn't a Property, It's a Runtime Guarantee







