Beyond raw intelligence: How Poetiq cracked the ARC-AGI-2 benchmark - TechTalks

AI lab Poetiq has officially topped the ARC-AGI-2 leaderboard with an approach that hints at a significant shift in how AI systems solve complex reasoning tasks. On November 20, 2025, the company announced preliminary results that have now been verified by the ARC Prize team. The Poetiq system achieved a score of 54% on the Semi-Private Test Set, significantly outperforming the previous state-of-the-art held by Gemini 3 Deep Think, which scored 45%.

Beyond the accuracy gains, Poetiq’s system reached this milestone at a cost of $30.57 per problem, compared to the $77.16 per problem cost of Gemini 3 Deep Think. This result suggests that progress in AI reasoning is moving away from purely scaling model size and reasoning tokens and toward the implementation of well-engineered systems that optimize performance at the application layer.

The challenge of ARC-AGI

To understand the significance of this achievement, one must look at the benchmark itself. ARC-AGI-1 (previously called ARC Challenge) is based on the Abstract Reasoning Corpus (ARC) introduced by François Chollet in 2019 to measure intelligence defined as efficient skill acquisition rather than the mastery of fixed tasks.

The benchmark consists of grid-based visual puzzles where the solver must infer an underlying rule from a few example input-output pairs and apply it to a new test grid. This format aims to test “core knowledge priors” and generalization, avoiding the pitfalls of benchmarks that can be solved through the memorization of vast training datasets.

Beyond raw intelligence: How Poetiq cracked the ARC-AGI-2 benchmark - TechTalks

Other newsrooms on this story

Related reading

Why AI benchmarks are broken - TechTalks

Button-pushing explorers: How to grasp that AI agents can do amazing things…

AI Week in Review 26.02.21

Anthropic’s new Claude 4.1 dominates coding tests days before GPT-5 arrives

Gemini AI solves coding problem that stumped 139 human teams at ICPC World…

AI Week in Review 26.03.07