I built a pipeline in a single session that consolidates the 58 tech-blog articles of my service Kotonia (ja/en/zh) into a semantic index, then uses that index to detect duplicates for new article mining. Raw articles → semantic index → TF-IDF dedup → chunked draft generation — full path running on local Gemma 4 26B driven by Codex CLI. Design and implementation notes follow.

The motivation and "how solo developer accumulated assets compound" framing is in the companion piece: The Day a Solo Developer's Accumulated Assets Finally Started to Compound

This piece keeps the technical notes.

1. The Problem — When Title-Only Dedup Broke

Mining v1 produced a draft and I (the user) noticed "this overlaps with an existing article." The overlap target was voice-first-local-llm (importance=9 flagship).