This article covers the first layer of the full-stack architecture: the Ingestion Pipeline. If this layer fails, the other five layers fail with it. The core engineering challenge: how do you standardize ingestion of multi-source heterogeneous documents (PDF tables / structured rules / HTML) without losing semantic structure?
0. The Pain Point
Before the system went live, the compliance team's workflow looked like this:
Download corporate ESG reports (PDF, averaging 200–300 pages)
Open the GRI standards document, cross-check rule by rule







