A year ago the question that animated the field was whether the scaling curve would keep delivering. Today the question is what to build on top of capabilities that were science fiction in 2023. This note is a personal accounting of what changed at the frontier through the first half of 2026, written from the perspective of someone who has been shipping AI-driven systems into a production research stack.
The summary up front: capability went up roughly as expected. The shape of the curve is what moved. Long context, tool use, and agentic behavior compounded faster than anyone publicly modeled in 2024. Open weights closed the gap further than the labs predicted. The cost curve also bent, but the bend went both ways depending on which segment you measure.
Long context became the dominant axis
The most underrated capability shift in the past twelve months is long context. Claude 4.7 ships a 1M token effective window. GPT-5.5 sits in the same neighborhood. Gemini's long-context tier extends further still, with claims (and corresponding caveats) about multi-million token windows.
What changed is not the headline number. The headline number was achievable a year ago through retrieval gymnastics. What changed is that the model now actually uses the context. The needle-in-haystack benchmarks that were the standard in 2024 looked impressive but did not predict downstream behavior. A model could find a sentence in 1M tokens and still fail to reason across the document.








