How we handle complex EPUB structures for AI translation without breaking navigation and metadata

At LectuLibre, we built an AI‑powered book translation service. Users upload an EPUB, and our pipeline translates the text using LLMs like Claude and DeepSeek. That sounds straightforward until you have to parse and rebuild a valid EPUB without mangling the table of contents, internal links, or styles.

I’m sharing the real‑world challenge we faced, how we chose our tooling, and the ugly corners we discovered when dealing with real‑world EPUB files.

The Problem: EPUB is a Messy Zip File

An EPUB is essentially a ZIP archive containing XHTML, CSS, images, and an OPF manifest. It’s a well‑defined standard (EPUB 3.2), but in practice publishers produce files that bend the rules: missing container.xml, inline styles that break after translation, and structural quirks that make parsing fragile.