How AI Inference Sends Decision Making To The Edge

gettyPhillip Marangella, Chief Marketing and Product Officer, EdgeConneX.Stop thinking of the edge as a remote extension of the cloud and start treating it as a distributed decision layer. For two decades, the edge was largely about getting content closer to users. In the AI era, it is becoming the place where intelligent systems make decisions in real time. As inference workloads continue to grow, the focus shifts from "Where can we aggregate massive power and compute?" to "Where must intelligence live for the application to work?"The Edge Before AI And The Rise Of The CloudBack in the "pre-cloud" days—when organizations kept their servers in the basement at headquarters or leased data center space from third-party providers, when Netflix was mailing DVDs and Amazon was mailing books, there was little difference between the core and the edge.In 1999, Akamai “moved the Web closer to users” by duplicating popular web content across a worldwide network of servers and relocating content according to regional demand. This helped establish the application-delivery pattern that would shape the next two decades: compute and storage concentrated in central data centers, with latency-sensitive caching at the edge, closer to end users.In 2006, Amazon’s launch of AWS helped define the modern cloud era, shifting enterprise IT to hyperscale data centers in regions organized into availability zones and, over time, extending that model outward through edge locations, CDN infrastructure and cloud on-ramps.AI Model Training In 'Gigascale' Data CentersAI model-training places very different demands on data center design than traditional cloud workloads. For one, it depends on access to massive power and land. Model training "facilities" are often referred to as "gigascale"—one gigawatt or more, which is 10x the size of the typical hyperscale cloud data center. They can span hundreds of acres. These requirements have driven a “bring the data center to the power” logic and pushed development into new markets, often in rural areas.Model-training workloads run at significantly higher densities than cloud workloads. For example, the latest Nvidia systems are listed at 132–140 kW per rack and industry roadmaps point toward even higher-density racks over time. (In contrast, the average cloud deployment is about 10 kW per rack.) At these densities, air cooling is often no longer sufficient or economical. So in addition to driving data centers to new locations, demand for AI model training is also making direct liquid cooling increasingly necessary. That represents a dramatically different way of designing and operating the data center.But training is only one side of the AI infrastructure story. Once models are deployed, the operational challenge becomes serving them effectively and efficiently.In The AI Inference Era, This Isn’t Your Parents’ EdgeAI inference turns the edge from a place where data is collected into a place where decisions are made.As AI adoption grows, a greater share of workloads are inference—the process of using a trained model to analyze new data and produce an output. McKinsey projects inference will surpass training by 2030, accounting for more than half of AI compute and roughly 30%-40% of total data center demand. (Of course, training will remain strategically important even as inference becomes the dominant AI workload type.)This shift to inference is pushing decision-making closer to where data is created: factories, stores, hospitals, vehicles, devices, cameras and branch locations. IDC projects worldwide edge computing spending will grow at almost 14% (CAGR) between 2025 and 2028, with “supercharged growth” in edge AI workloads.Where the focus for training workloads was on training bigger models, for inference, it’s serving those models cost-efficiently and reliably, at scale, all day. User-facing apps need low time-to-first-token and consistent response times. Proximity is required to make AI usable in live products: voice agents, coding assistants, fraud detection, gaming, medical workflows, robotics, AR/VR, factory automation, personalization, customer service and agentic workflows.The edge is becoming AI’s execution layer.Because the requirements of inference workloads are different than model training, the data center strategy is different as well. Inference workloads are deployed across many distributed sites, closer to users, optimized for continuous, latency-sensitive serving. Power still matters; inference is less dense per workload than training but total demand is continuous and massive.This is not our parents’ edge, which was often small, lightly staffed and optimized for caching or simple enterprise workloads. The AI inference edge may need accelerators, high memory bandwidth, strong networking, better cooling, higher resiliency and sophisticated orchestration. Inference is operationally more like a live service than a batch job.AI Infrastructure Designed Around WorkloadsNot every AI workload belongs in the same place. Training large models may remain centralized, while inference workloads may run in regional data centers and edge facilities as well as private environments or devices. And while data centers will continue to be purpose-built for large-scale AI model training to some extent, more data centers will support mixed workloads—model training, inference and traditional cloud or enterprise workloads.For technology leaders, the practical question is not whether to choose cloud or edge. It is how to place workloads based on five factors: latency, data sensitivity, power availability, cost to serve and operational resilience. Training, inference, data storage and decision-making may each belong in different places depending on the application.The next phase of AI infrastructure will not be defined by a single destination called “the cloud” or “the edge.” It will be defined by workload placement: putting training, inference, data and decision-making wherever they create the most value. For technology leaders, the edge is no longer the periphery of the architecture. It is becoming one of the places where the business thinks and acts.Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

How AI Inference Sends Decision Making To The Edge

How AI Inference Sends Decision Making To The Edge

Related reading

Explainer: Edge AI

What Business Leaders Need To Know About Developing Edge AI

Apple’s On-Device AI: The Quiet Revolution for Edge Computing and Local-First…

The decentralized hyperscaler: how "micro Edge" is reshaping the AI data center…

From training to inference: why DCIM is becoming mission-critical

Will the hyperscalers own AI workloads forever?

Related reading

Explainer: Edge AI

What Business Leaders Need To Know About Developing Edge AI

Apple’s On-Device AI: The Quiet Revolution for Edge Computing and Local-First…

The decentralized hyperscaler: how "micro Edge" is reshaping the AI data center…

From training to inference: why DCIM is becoming mission-critical

Will the hyperscalers own AI workloads forever?