OpenAI's Codex CLI ships with a great editor-agent UX: shell tool, apply_patch, plan tracking, the lot. The catch — as of February 2026 it only speaks the OpenAI Responses API. Chat Completion support was dropped (codex-rs/model-provider-info/src/lib.rs: the WireApi enum has one variant, Responses). If you wanted to point it at a Chat-Completion-only endpoint — Ollama, LM Studio, your favorite Llama runner — you're out of luck.
But Codex CLI is happy to talk to any server that speaks Responses. It has a model_provider config block exactly for that. So if you can stand up a Responses-shaped HTTP endpoint backed by the model of your choice, Codex becomes a generic front-end and you choose the brain.
Here's the trick I've been using: a 50-line C# script that runs as both an OpenAI Chat Completion server and a Responses API server, on top of Microsoft.Extensions.AI's vendor-neutral IChatClient abstraction. I then point it at OpenRouter — one API key, hundreds of models including Claude, Gemini, Llama, GPT, you name it — and tell Codex to talk to my local script instead of OpenAI.
End result: OpenAI Codex CLI running on Anthropic's Claude 3.5 Sonnet (or whichever model I'm feeling like that day).











