CoderForge-Preview: SOTA open dataset for training efficient coding agents

As coding agents become increasingly capable, the research community faces a critical bottleneck: the lack of large-scale, high-quality open training data. While proprietary models continue to advance, open-weight alternatives have been held back by limited access to the long-context, test-verified trajectories needed for effective agent training.We're releasing CoderForge-Preview, the largest open dataset of coding agent trajectories to date - 258k test-verified trajectories (155k pass | 103k fail) spanning 51K tasks across 1,655 repositories, and share our results of using it to train 32B and 4B models on it. By releasing CoderForge openly, we aim to accelerate progress across the entire open-source AI community and enable researchers everywhere to build, study, and improve upon our work. Fine-tuning Qwen-3 32B achieves 59.4% pass@1 on SWE-Bench Verified [8], ranking #1 among open-data models in the ≤32B parameter range.We release the full trajectory dataset, as well as the evaluation trajectories for 32B.Dataset‍32B Evaluation TrajectoriesCoderForge-Preview DataWe generate agent trajectories from three different task sources using Qwen3-Coder-480B and apply rejection sampling to filter out solutions that fail to pass the tests. This process yields 258K long-context trajectories (up to 128K tokens) across 51K tasks, from which we retain 155K high-quality, test-verified trajectories for SFT training.Task sourcesWe draw tasks from three sources: R2E-Gym [5], SWE-Smith [6], and SWE-Rebench [7]:

domenica 17 maggio 2026 New tab

CoderForge-Preview: SOTA open dataset for training efficient coding agents

CoderForge-Preview: SOTA open dataset for training efficient coding agents

Other newsrooms on this story

Related reading

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory…

Open-Source Coding Agents 2026: Which One to Run

Towards self-driving codebases · Cursor

AI Coding Agents Need Runtime Telemetry Before Commit Telemetry

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from…

MirrorCode evaluates AI's long-horizon coding capabilities with 22 open-source…

Other newsrooms on this story

Related reading

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory…

Open-Source Coding Agents 2026: Which One to Run

Towards self-driving codebases · Cursor

AI Coding Agents Need Runtime Telemetry Before Commit Telemetry

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from…

MirrorCode evaluates AI's long-horizon coding capabilities with 22 open-source…