Originally published at htpbe.tech. The version on htpbe.tech stays in sync with the latest detection algorithm — refer to it for the canonical text.

Every PDF file carries two layers of information. The first is the visible content — the text, images, and layout a reader sees. The second is metadata: structured data describing the document itself. This second layer records when the document was created, which application produced it, whether it has been modified, and by what tools.

Forensic analysis of these fields can reconstruct a document’s history without examining its visible content at all. For document fraud detection professionals, understanding each field — what it stores, what it reveals, and what makes a value suspicious — is the foundation of PDF authenticity assessment.

This reference covers every major metadata field used in PDF forensics, the structural element that cannot be cleared (the cross-reference table), and how these signals combine into an overall authenticity verdict.

Two Metadata Systems in One File