Hi, I'm Ryan, CTO at airCloset.

In Part 1, I wrote about unifying 46 repositories of production code into a single knowledge graph via static analysis. The graph itself got built, but I closed the post with four open issues: no semantic search, node explosion, having to open the file to actually know what a function does, and the cost of writing a new parser every time a new boundary pattern showed up.

This Part 2 is about how I solved the first one — the entry-point problem (no semantic search). The other three are left exactly as Part 1 described them — I'll come back to them at the end, together with the new issues that surfaced once the entry-point problem was out of the way.

The reason to start with the entry-point problem is simple: if the graph exists but the only way to reach it is grep, the model ends up inferring anyway. The whole point — "give the model verified facts, not inference" — falls apart. So the entry-point problem had to be solved before the others.

The Hint Was in db-graph