After nearly four years and hundreds of billions burned building smarter and more capable models, folks understandably would like to see them do something more than run a chatbot. In this respect, OpenClaw served like blood in the water, demonstrating that, in spite of its seemingly endless supply of security flaws, LLMs really can be used to automate complex tasks. Since then, you've probably noticed the term "harness" coming up more frequently to describe agentic AI frameworks, and for good reason. You don't need a harness to interact with a chatbot – local tools like Ollama send API calls directly to the LLMs – but to do today's advanced work, they are essential.
On their face, AI harnesses are just a bit of code that wraps around an LLM's API endpoint, orchestrates tool calls, and manages context. OpenClaw, Claude Code, Codex, and Pi Coding Agent are all examples of code-focused harnesses you may already be familiar with.
As simple as all this sounds, harnesses are changing the way we think about everything from training new models to how we build and run them at scale. LLM inference on its own is pretty dumb – not the models so much as the way we interact with them. The OpenAI-compatible API calls that have become the de facto standard are transactional. With most early chatbots, you made a request and the API would supply a response.A harness, by comparison, orchestrates those API calls, breaking down one request into multiple. If you were to ask a code agent to build an app that parses logs, the harness might make one request to plan things out, another to review the log directory, a third to generate and execute that code in an interpreter, and a fourth to debug and fix any errors. This multi-step loop would continue until the work is done or the harness cuts it short to ask for user input.At least for coding, these harnesses are getting good enough to be useful. In fact, a harness may have a bigger impact on whether the code assistant will be successful than the model itself. Even Qwen3.6-27B, a small-to-medium-sized LLM, proved to be a surprisingly effective alternative to larger paid models when paired with harnesses like Anthropic’s Claude Code or Cline. And yes, if you didn’t know, Claude Code works with any model you like.In fact, the realization that small models with well-designed harnesses can now automate complex tasks has contributed to a shortage of Mac Minis, as AI enthusiasts race to self-host OpenClaw and LLMs on them.Changing the way we build modelsTraining dominated the first two years of the AI boom. OpenAI, Google, Microsoft and others raced to build smarter models using as much data as they could harvest.









