In this tutorial, we explore the Open-SWE-Traces dataset as a practical resource for studying and preparing agentic software-engineering trajectories for fine-tuning. We stream the dataset directly from Hugging Face, so we can work with a large dataset efficiently in Google Colab without downloading everything locally. We inspect individual records, normalize multi-turn agent conversations, parse final code patches, extract useful metadata, and build an analysis DataFrame to understand trajectory length, tool usage, patch size, language distribution, and resolution outcomes. We then use these insights to create a curated supervised fine-tuning subset that keeps only high-quality trajectories based on success labels, token limits, language filters, and patch availability.

Installing Dependencies and Configuration

import subprocess, sys

def _pip(*pkgs):

subprocess.run([sys.executable, "-m", "pip", "install", "-q", *pkgs], check=False)