Teaching a large language model something new after it’s been trained is, to put it charitably, a pain. You either retrain the whole thing (expensive), stuff documents into its context window (limited), or bolt on retrieval systems that often choke on complex queries. Researchers from MIT CSAIL, the National University of Singapore, and A*STAR just published a framework that sidesteps all three problems.

The framework is called MeMo, short for Memory as a Model. It was detailed in a paper released on May 20, 2026 (arXiv:2605.15156), and the core idea is elegantly simple: instead of forcing new knowledge into an existing LLM, train a separate, smaller model whose only job is to remember things. The main LLM stays frozen. It just asks the memory model questions when it needs answers.

How MeMo actually works

In technical terms, MeMo uses a five-step reflection QA synthesis pipeline to train the Memory model on new domain knowledge. At inference time, the frozen Executive LLM, such as Qwen2.5 or Gemini-3-Flash, queries the Memory model through a structured multi-turn protocol. The Memory model internalizes the information rather than merely retrieving text chunks, which is what distinguishes it from traditional retrieval-augmented generation (RAG) setups.