Chunk clean article content for embeddings, summarization, and full-text search—skip nav, clap bars, and scripts.

Extract Plain Text from Medium Posts for RAG and Search Indexes

HTML embeds are for humans; plain text is for chunking, embeddings, and summarization. One call should return body text without nav, clap bars, or script tags.

Tool outcome: ingest-medium-article.ts → chunked documents in your vector DB.

Pipeline