Anthropic recently introduced Agent Skills [1]—folders containing a SKILLS.md file with instructions, scripts, and resources that agents can discover and load dynamically. The model reads the skill when relevant, executes bundled code, and solves problems using complex control flow rather than chaining discrete tool calls. If this sounds familiar, it should. In late 2023, when I was building the first version of AutoGen Studio, this is exactly what we did. Users wrote Python files as skills, we added them to context (full code or just signatures), and the model wrote programs that integrated/orchestrated those skills. At the time, most models were inconsistent in generated structured output (the building blocks for tool calling) but mostly of them could write code fairly well.Fun fact: One of the first agentic systems work I led at Microsoft in August 2022 (before chatgpt release, before autogen) was LIDA - an agentic workflow for automated data visualization based on code execution. The visualization module worked by generating code (using the Davinci 2 and gpt-3.5-turbo models at the time) which was executed to created visualizations.Then everyone moved to structured tool calling, driven by libraries like Instructor (from Jason Liu) and Pydantic that promised “guarantees about output types” with automatic validation and retries. Now it’s moving back to code (citing efficiency gains). This post traces that arc and explains why the return may or may not make sense.This post draws on themes from Designing Multi-Agent Systems, which covers tool design, code execution, and security considerations for agents in depth.At about mid 2022, OpenAI Davinci2 models were getting really good at writing code and it became clear that they could write code that got executed in a loop to solve problems. The setup was simple: prompt the model to write code in detectable blocks (triple backticks, XML tags), parse it out, execute it. This was attractive as it mostly worked across models of all sizes (for the longest time, most models just did not support json mode or structured tool calling, especially small 7B models that drove the bulk of experimentation at the time).As an example, in the early days of AutoGen Studio, users could write a generate_image.py file with a generate_image() method. The model was prompted to write code that assumed those skills existed and could import them:# AutoGen Studio skill file (2023)