The usual measures of AI progress have not suited my lived experience for some time now:

One measure is “maximum length of human task that AI can complete.” The ideal goal here seems to be AI developing ever larger software systems, like browsers.

Another measure is “key reasoning breakthroughs,” like proving some math theorem or finding zero-days. The ideal goal here seems to be the Riemann hypothesis.

I think both are worthwhile goals. But in my day job doing enterprise software development, neither of these is limiting me. What limits me is the degree of trust I am permitted to place in an agent’s actions without compromising my employer’s information security.

Almost everybody limits what their AI agent can do, through sandboxing, approval flows, or manual review.