oh-my-agent: skills now measure and optimize their own utility

Most skill libraries grow by accretion. You add a SKILL.md, it sounds useful, and it lives forever because nobody can prove it helps or hurts. This week oh-my-agent closed that gap: oma skills eval measures whether loading a skill actually improves held-out task outcomes, and oma skills opt rewrites the skill to push that number up. 194 commits landed, CLI is at 8.41.0, but the eval-to-opt loop is the part worth your attention.

What's new

oma skills eval: measures utilityLift (treatment vs baseline) on held-out tasks. --mock replays recorded rollouts deterministically, --live spawns two read-only agentic arms per task, --record captures the rollouts. Default checker is judge (an LLM grades output against a rubric); assert and regex are opt-in deterministic checks.

oma skills opt: an optimizer LLM proposes bounded add/delete/replace edits to a SKILL.md, re-scores each candidate through eval, and accepts only when held-out validation lift strictly improves with no negative-transfer regression (SkillOpt, arXiv:2605.23904). --dry-run is the default; --apply writes through atomic temp+rename with a .bak backup.

Negative-transfer sampling: --neg-transfer checks whether loading one skill regresses unrelated same-domain tasks from other skills' eval sets.

What's new

Negative-transfer sampling: --neg-transfer checks whether loading one skill regresses unrelated same-domain tasks from other skills' eval sets.

oh-my-agent: skills now measure and optimize their own utility

oh-my-agent: skills now measure and optimize their own utility

Related reading

Is Your Agent Skill Actually Good? Microsoft's Dual-Paper Deep Dive into Skill…

oh-my-agent v2: Nine New Skills, First-Class Cursor, and an 80/100 Benchmark

SkillOpt: Revolutionizing AI Agent Skills in 2026

Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without…

Agent Skills has a blind spot — and here's how to fix it

Skill Rating Tool - Score & Optimize Your SKILL.md Easily

Related reading

Is Your Agent Skill Actually Good? Microsoft's Dual-Paper Deep Dive into Skill…

oh-my-agent v2: Nine New Skills, First-Class Cursor, and an 80/100 Benchmark

SkillOpt: Revolutionizing AI Agent Skills in 2026

Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without…

Agent Skills has a blind spot — and here's how to fix it

Skill Rating Tool - Score & Optimize Your SKILL.md Easily