What failing at building an AI agent taught me about building AI agents.

I scored 3/50 on a take-home benchmark for a job application. And I still got the job.

At the time, I hadn't built a fully agentic system before. While I had worked with LLM pipelines and small AI tools, an entirely autonomous architecture was completely new to me. And this taught me a few important lessons — not just about AI agents, but how to approach unfamiliar engineering problems.

After showing my results in the job interview, my (now) CTO mentioned that he had noticed I was using a cheap mini LLM (endless testing had racked up quite a bill!) and that he had tried out my agent with the frontier Opus model. Funnily enough, the agent actually performed worse, scoring only 2/50!

The problem was not the model. I had architected a clean system with plugin-based tooling, consistent interfaces, and a semi-autonomous pipeline that enforced structure around the agent. My plan was to start constrained — give the agent specific tools to achieve a subset of questions and slowly expand the agent with new tools around my well-architected abstractions until it could solve everything.

On paper, the code looked solid. In practice, it couldn't solve the problems.

I scored 3/50 on a take-home benchmark for a job application. And I still got the job.

On paper, the code looked solid. In practice, it couldn't solve the problems.

What failing at building an AI agent taught me about building AI agents.

What failing at building an AI agent taught me about building AI agents.

Other newsrooms on this story

Related reading

I Built a 'Production-Ready' AI Agent Framework. It Was a Lie. So I Fixed It.

[I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That…

How I Used Claude to Finish Building an AI That Evaluates AI — and Caught It…

I’m Building Around the Gap Between AI Output and Repo Truth

Your AI agent has amnesia. You've just normalized it.

The Hardest Part Of Building A RAG App Wasn’t The AI

Other newsrooms on this story

Related reading

I Built a 'Production-Ready' AI Agent Framework. It Was a Lie. So I Fixed It.

[I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That…

How I Used Claude to Finish Building an AI That Evaluates AI — and Caught It…

I’m Building Around the Gap Between AI Output and Repo Truth

Your AI agent has amnesia. You've just normalized it.

The Hardest Part Of Building A RAG App Wasn’t The AI