Poetiq's Meta-System Automatically Builds a Model-Agnostic Harness That Improved Every LLM Tested on LiveCodeBench Pro Without Fine-Tuning

Poetiq has just published some very interesting results showing its Meta-System reached a new state-of-the-art on LiveCodeBench Pro (LCB Pro), a competitive coding benchmark, by automatically building and optimizing its own inference harness — without fine-tuning any underlying model or accessing model internals.

The result: GPT 5.5 High with Poetiq’s harness scores 93.9% on LCB Pro (25Q2), up from its baseline of 89.6%. Gemini 3.1 Pro, the model the harness was specifically optimized on, jumps from 78.6% to 90.9% — surpassing Google’s own Gemini 3 Deep Think (88.8%), a model that isn’t even accessible via API for external verification.

Before getting into the mechanics, it helps to understand why the benchmark matters. LiveCodeBench Pro (LCB) is designed to test AI coding ability in a way that resists two common failure modes in benchmarks: data contamination and overfitting.

LCB Pro pulls problems from major competitive programming competitions and withholds public ground-truth code. Instead, solutions are validated against a comprehensive testing framework. Correct output alone isn’t enough — solutions must also satisfy specific memory and runtime constraints. The benchmark is also subject to continuous updates, which distinguishes it from many standard benchmarks that become stale.

Poetiq's Meta-System Automatically Builds a Model-Agnostic Harness That Improved Every LLM Tested on LiveCodeBench Pro Without Fine-Tuning

Other newsrooms on this story

Related reading

I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for…

Other newsrooms on this story

Related reading

I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for…

Meta's Superintelligence Lab unveils its first public model, Muse Spark

Popular AI model performance benchmark may be flawed, Meta researchers warn

Stop guessing why your LLMs break: Anthropic’s new tool shows you exactly what…

This researcher turned OpenAI’s open weights model gpt-oss-20b into a…

Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in…