Parsing and Rebuilding EPUB Files in Python: Lessons Learned

How we handle complex EPUB structures for AI translation without breaking navigation and metadata

At LectuLibre, we built an AI‑powered book translation service. Users upload an EPUB, and our pipeline translates the text using LLMs like Claude and DeepSeek. That sounds straightforward until you have to parse and rebuild a valid EPUB without mangling the table of contents, internal links, or styles.

I’m sharing the real‑world challenge we faced, how we chose our tooling, and the ugly corners we discovered when dealing with real‑world EPUB files.

The Problem: EPUB is a Messy Zip File

An EPUB is essentially a ZIP archive containing XHTML, CSS, images, and an OPF manifest. It’s a well‑defined standard (EPUB 3.2), but in practice publishers produce files that bend the rules: missing container.xml, inline styles that break after translation, and structural quirks that make parsing fragile.

How we handle complex EPUB structures for AI translation without breaking navigation and metadata

I’m sharing the real‑world challenge we faced, how we chose our tooling, and the ugly corners we discovered when dealing with real‑world EPUB files.

The Problem: EPUB is a Messy Zip File

Parsing and Rebuilding EPUB Files in Python: Lessons Learned

Parsing and Rebuilding EPUB Files in Python: Lessons Learned

Other newsrooms on this story

Related reading

Parsing and Rebuilding EPUB Files in Python: Lessons Learned from Building an…

How We Built a Robust EPUB Parsing and Rebuilding Pipeline in Python

How We Translate Entire Books with LLMs Without Losing Context

Building a Multilingual News App with AI Translation

Stop Babysitting Agents, Start Authoring Outcomes

The Developer’s Guide to Translating Foreign PDFs (Text, OCR, and AI Workflows)

Other newsrooms on this story

Related reading

Parsing and Rebuilding EPUB Files in Python: Lessons Learned from Building an…

How We Built a Robust EPUB Parsing and Rebuilding Pipeline in Python

How We Translate Entire Books with LLMs Without Losing Context

Building a Multilingual News App with AI Translation

Stop Babysitting Agents, Start Authoring Outcomes

The Developer’s Guide to Translating Foreign PDFs (Text, OCR, and AI Workflows)