Researchers at Stanford University and Lambda Labs, have published the research paper for OpenJarvis, an open-source framework that runs inference, agents, memory, and learning entirely on-device.
The open-weight models configured through OpenJarvis land within 3.2 percentage points of the best cloud model on average, at roughly 800× lower marginal API cost per query and roughly 4× lower latency under the research’s benchmark protocol. This research work builds on the research team’s earlier Intelligence Per Watt study, which reported that local models already handle 88.7% of single-turn chat and reasoning queries at interactive latency, with intelligence efficiency improving 5.3× from 2023 to 2025.
Model Overview & Access
OpenJarvis is not a single model. It is a framework that composes any supported model with a configurable agent stack, evaluated across 11 local models from four families.
PropertyValueLicenseApache 2.0Framework releaseMarch 12, 2026PaperarXiv:2605.17172 (posted May 16, 2026)Repositorygithub.com/open-jarvis/OpenJarvisStars / forks~5.4k / ~1.2k (June 2026)LanguagesPython (~83%), Rust (~9%), TypeScript (~7%)Evaluated models11 local models across 4 families: Qwen3.5, Gemma4, Nemotron, GraniteCloud baselinesClaude Opus 4.6, GPT-5.4, Gemini 3.1 ProSupported enginesOllama, vLLM, SGLang, llama.cpp, Apple Foundation Models, Exo (among others)Context windowModel-dependentInstallationSingle command; ~3 minutes on broadbandHardwareTested on 7 platforms, from Mac Mini M4 to NVIDIA DGX Spark
















