Joint work with Sebastian Angel (University of Pennsylvania), Sofía Celi (Brave), Elizabeth Margolin (University of Pennsylvania), Pratyush Mishra (University of Pennsylvania), Martin Sander (University of Pennsylvania), Jess Woods (University of Pennsylvania).

Parsing is one of those fundamental operations in computing that usually goes unnoticed. Whenever a browser renders a web page, a firewall inspects traffic, or a compiler transforms code, some parser is silently turning a raw stream of bytes into a structured object. We tend to take this step for granted, assuming that once the input is parsed, the rest of the system can reason safely about it. We also tend to expect browsers or compilers to do this “out-of-the-box” and to fix any parsing errors on the fly.

This expectation is not unreasonable in day-to-day computing. Browsers recover gracefully from malformed HTML, and compilers flag syntax errors so that developers can fix them. But in privacy-preserving verification settings, particularly in zero-knowledge proof systems, this assumption and leniency is problematic. A prover might commit to some byte stream and then claim that it represents a valid JSON document, a token, or a source file, without ever proving that parsing was carried out correctly. The verifier, lacking access to the underlying data, has no way to tell if the prover is being honest. This missing link between raw bytes and structured data has quietly limited the scope of many proposed ZK applications. A proof system that assumes “the input is already a valid JSON object” or “the transcript must be well-formed” leaves itself open to subtle but impactful attacks. If a prover can slip malformed input (without verifying the parsing stage), they may be able to prove false claims with valid-looking proofs.