How to Use AgentTrove: Streaming 1.7M Agentic Traces and Building a Clean ShareGPT SFT Dataset in Python

In this tutorial, we explore AgentTrove, one of the largest open-source collections of agentic interaction traces, and learn how to work with it efficiently. Instead of downloading the full dataset, we use streaming to inspect rows, detect the conversation schema, normalize agent turns, and understand how user, assistant, system, and tool messages are structured. We also build utilities to parse command-style assistant outputs, render complete trajectories in a readable format, and study how agents interact with tools across different tasks. Also, we create a lightweight analytical workflow that samples thousands of traces, converts them into a DataFrame, summarizes turn-level statistics, visualizes important dataset patterns, and exports successful traces into a clean ShareGPT-style JSONL format for supervised fine-tuning.

!pip -q install "datasets>=2.19" pandas matplotlib pyarrow huggingface_hub

import itertools, json, collections, textwrap, re, random, statistics

import pandas as pd

import matplotlib.pyplot as plt

!pip -q install "datasets>=2.19" pandas matplotlib pyarrow huggingface_hub

import itertools, json, collections, textwrap, re, random, statistics

import pandas as pd

import matplotlib.pyplot as plt

How to Use AgentTrove: Streaming 1.7M Agentic Traces and Building a Clean ShareGPT SFT Dataset in Python

How to Use AgentTrove: Streaming 1.7M Agentic Traces and Building a Clean ShareGPT SFT Dataset in Python

Other newsrooms on this story

Related reading

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory…

Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls,…

One Triage Pass, Every Trace Format: Stop Letting Fragmentation Shrink Your…

CoderForge-Preview: SOTA open dataset for training efficient coding agents

Distributed Tracing Shows You What Happened. It Cannot Prove It to a Regulator.

How to Monitor AI Agents in Production

Other newsrooms on this story

Related reading

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory…

Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls,…

One Triage Pass, Every Trace Format: Stop Letting Fragmentation Shrink Your…

CoderForge-Preview: SOTA open dataset for training efficient coding agents

Distributed Tracing Shows You What Happened. It Cannot Prove It to a Regulator.

How to Monitor AI Agents in Production