By: Emmanuel Alawode

In 2022, during a Journalism AI Fellowship at Polis, I partnered with Joshua Olufemi to begin building Nubia, an AI platform for African data journalism. What began as conditional algorithms and SpaCy NLP routines wired into a templating engine has grown, over four years, into a production system in partnerships with Archivi.ng, Daily Trust, and Business Day, training on corpora that very little of the dominant AI tooling has ever seen.

I designed and built its technical foundation: a Next.js frontend on a microservice backend running an agentic architecture, a hosted LLM service, and a vector store designed around how journalists actually use sources. The first dataset we tried to load was a procurement filing from a Nigerian state government, a 200-page scanned PDF in which the numbers we needed sat inside a table printed, signed and rescanned enough times that the digits had started to bleed into each other.

The dominant AI-for-journalism conversation, then and now, is largely about generation.The implicit assumption is that you have structured data, or at minimum, clean text. For the newsrooms we were building for, that assumption was almost never true. The hard problems were upstream of generation, and they stayed upstream no matter the generative model used.