Teaching an AI something new after it’s already been trained is one of the most expensive problems in the industry. The typical solutions involve either retraining the entire model (slow, costly) or cramming new information into the context window (limited, unreliable). A team of researchers from MIT CSAIL, the National University of Singapore, A*STAR, and the Singapore-MIT Alliance for Research and Technology just proposed a third option that sidesteps both.

Their framework, called MeMo (Memory as a Model), encodes new knowledge into a separate, smaller “Memory” model that works alongside the primary LLM without touching its core parameters. On relevant benchmarks, the approach delivered performance gains of up to 26%.

How MeMo actually works

The framework uses what the researchers call a five-step reflection QA pipeline to integrate new domain-specific information into the Memory model. The main LLM, which the paper refers to as the “Executive” model, maintains its reasoning capabilities while the Memory model handles structured interactions across multiple conversational turns.

Multiple Memory models can be merged together in parameter space. That means you can have one Memory model trained on one knowledge domain, another trained on a different domain, and combine them without exponentially increasing compute costs.