Data scientists spend a lot of time cleaning and preparing large, unstructured datasets before analysis can begin, often requiring strong programming and statistical expertise. Managing feature engineering, model tuning, and consistency across workflows is complex and error-prone. These challenges are amplified by the slow, sequential nature of CPU-based ML workflows, which make experimentation and iteration painfully inefficient.
Accelerated data science ML agent
We prototyped a data science agent that can interpret user intent and orchestrate repetitive tasks in an ML workflow to simplify data science and ML experimentation. With GPU acceleration, the agent can process datasets with millions of samples using NVIDIA CUDA-X Data Science libraries. It showcases NVIDIA Nemotron Nano-9B-v2, a compact, powerful open-source language model designed to translate the intent of a data scientist into an optimized workflow.
With this setup, developers can explore large datasets, train models, and evaluate results just by chatting with the agent. It bridges the gap between natural language and high-performance computing, enabling users to go from raw data to business insights in minutes. We encourage you to use this as a starting point to build your own agent with different LLMs, tools, and storage solutions tailored to your specific needs. Explore the Python scripts for this agent on GitHub.






