This article covers the first layer of the full-stack architecture: the Ingestion Pipeline. If this layer fails, the other five layers fail with it. The core engineering challenge: how do you standardize ingestion of multi-source heterogeneous documents (PDF tables / structured rules / HTML) without losing semantic structure?

0. The Pain Point

Before the system went live, the compliance team's workflow looked like this:

Download corporate ESG reports (PDF, averaging 200–300 pages)

Open the GRI standards document, cross-check rule by rule