Subquadratic — Efficiency is Intelligence

The first modelbuilt for long‑context tasksSubQ is a sub-quadratic LLM built for 12M-token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss.Context12Mtoken reasoningSpeed150tokens per secondCost1/5of other leading LLMsUse CasesAll your context. Always available.Reason across 12M tokens in one prompt: entire repos, months of PRs, and long-running agent state, with room to spare at one-fifth the cost.012M~5.1M~7.5M~ Approximate token counts.ArchitectureNot just another model.An architectural breakthrough.SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter.SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale.Technical report (coming soon)BenchmarksA leader in long-context retrieval and coding tasks.BenchmarksGemini 3.1 ProOpus 4.6Opus 4.7GPT-5.4GPT-5.5SubQ 1M-PreviewSWE-Bench VerifiedReal-world software engineering ability80.6%80.8%87.6%n/rn/r81.8%RULER @ 128KLong-context accuracy across 13 testsn/r94.8%*n/rn/rn/r95.6%MRCR v2 (8-needle, 1M)Multi-round coreference resolution in long contexts26.3%78.3%32.2%36.6%74.0%86.2%n/r = result was not reported by the model provider* = internally evaluatedSubQ results are third-party validatedProductsTwo ways to use SubQ.APIFor developers and teamsThe full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.→ 12M token context window→ Streaming + tool use→ OpenAI-compatible endpointsCodeFor coding agentsThe long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster.→ ~25% lower bill, 10× faster exploration→ Auto-redirects expensive model turns→ One-line installResearchFrom the lab.AboutWe built the architecture the industry said wasn't possible.Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't.Built by researchers fromMetaGoogleOxfordCambridgeBYUEarly AccessIs your business ready?Build with us.Join the private preview.

Subquadratic — Efficiency is Intelligence

Subquadratic — Efficiency is Intelligence

Other newsrooms on this story

Related reading

Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model;…

A startup claims it broke through a bottleneck that’s holding back LLMs

A startup says it cracked the bottleneck holding back AI

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context…

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed…

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Related reading

Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model;…

A startup claims it broke through a bottleneck that’s holding back LLMs

A startup says it cracked the bottleneck holding back AI

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context…

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed…

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Other newsrooms on this story