Step 1 in depth: pdfjs-dist

pdfjs-dist is Mozilla's PDF rendering library — the same engine that powers Firefox's built-in PDF viewer. In jaklens.ai, it runs in the Node.js process (via Electron's main process) to extract text content from each page of the invoice.

For a typical digital invoice PDF (generated by Stripe, PayPal, a CRM, or invoicing software), pdfjs produces clean Unicode text that preserves line structure. The output looks something like:

Invoice #: INV-2024-0891

Date: 15 March 2025