I built a small web scraping framework in Rust, mostly with an AI doing the typing. It's called ferrous — a Colly-style collector: register CSS selector callbacks, queue URLs, write JSONL. About 700 lines. The pitch I kept hearing, and half-believed, was that Rust and LLMs are a good match now: the borrow checker is a correctness oracle the model can lean on, so the class of bugs that plagues AI-written Python just won't compile.
That's true. It's also where the story gets uncomfortable, because the build was green and the code was still wrong.
How I worked
I'm not a Rust native. ferrous was partly an excuse to get fluent — build something real instead of reading about lifetimes — and partly a test of how far an LLM could carry the typing while I drove. The loop was plain: describe the next change in English, let the model write the Rust, read what came back, run cargo, move on. It kept observations.md as a running design journal, one entry per change, each with a short rationale for the decision it made.
That setup has a soft spot, and it's the whole point of this post. When you drive a language you don't fully know, the only reviewer you've got with real authority is the compiler. Everything past that — is this idiomatic, is it the right abstraction, does it actually do what the journal claims — depends on already knowing what correct looks like. Which is exactly the knowledge a learner doesn't have yet. Keep that in mind through the next part, because it's the difference between the one bug I could have caught and the six I couldn't.







