Originally published at htpbe.tech. The version on htpbe.tech stays in sync with the latest detection algorithm — refer to it for the canonical text.
Most PDF fraud detection focuses on what you can read: timestamps, producer strings, creator fields. That metadata is useful, but it is also the easiest thing to forge. A fraudster can overwrite every metadata field in under a minute.
The xref table is different. It is not a metadata field you can find and overwrite with a text editor. It is the PDF’s internal index — the structure the reader uses to locate every object in the file. Because the PDF specification requires editors to append a new xref rather than rewrite the existing one, every save operation leaves a structural mark that cannot be cleanly removed without rebuilding the file from scratch.
This article covers what the xref table is at the byte level, why it is the primary forensic signal for incremental modifications, what the edit trail looks like in practice, and how HTPBE reads the xref chain to produce intact, modified, and inconclusive verdicts. If you want a broader overview of how PDF tamper detection works across all five analysis layers, that is covered separately.







