I do a fair amount of competitor backlink research, and the workflow always annoyed me: open a dashboard, run a query, export a CSV, eyeball it, copy domains into a doc, switch to email. Lots of tab-hopping for what is fundamentally a data-filtering problem an agent should handle.
So I wrapped the backlink API I'd been using into an MCP server. Now I stay in Claude Code (or Cursor, Cline, Zed, Windsurf) and just describe the goal. This is the build: the architecture, the four tools, and the one design decision I'm still not sure about.
The data source
The server runs on the Common Crawl hyperlink webgraph — about 4.4 billion edges across 120 million domains, published quarterly as Parquet. That matters for an MCP tool specifically: the data is open, so there's no scraped-proprietary-index liability in handing it to an agent, and the same query is reproducible by anyone.
The HTTP API in front of it (CrawlGraph) does the heavy DuckDB work; the MCP server is a thin TypeScript stdio client over it. Keeping the server thin was deliberate — all the query cost, caching, and quota logic lives server-side, so the MCP package stays a ~300-line wrapper that's easy to audit before you hand it your API key.






