Back to Articles

llama.cpp server now ships with router mode, which lets you dynamically load, unload, and switch between multiple models without restarting.

Reminder: llama.cpp server is a lightweight, OpenAI-compatible HTTP server for running LLMs locally.

This feature was a popular request to bring Ollama-style model management to llama.cpp. It uses a multi-process architecture where each model runs in its own process, so if one model crashes, others remain unaffected.

Quick Start