Six months ago I started building an AI agent to automate job searching. Not a CV optimiser, not a job board aggregator — an agent that opens a browser, searches multiple job boards, reads each listing, scores it against a candidate's CV, and manages the whole pipeline without human input. This is the story of what broke, what surprised me, and what the architecture looks like now.

If you're working on AI job search automation or agentic systems in general, some of this will be familiar. Some of it will save you a week.

Why the First Version Was Useless

The initial prototype was embarrassingly naive. I fed a job description and a CV into a prompt and asked GPT to give a match score. It confidently returned 87% for a senior staff engineer role when the candidate had two years of experience. The problem was obvious in retrospect: LLMs are optimistic. Without hard constraints, they find reasons to match rather than reasons to reject.

Version two introduced structured scoring. I broke the evaluation into five weighted dimensions: skills overlap, seniority fit, location, salary band, and role type. Each dimension scored independently before an aggregate was calculated. This alone dropped false positives by around 60%.