From inference to agents: Scaling AI in the enterprise with Red Hat AI 3.4

Enterprise AI is shifting from simple chatbots toward agentic AI. These systems use independent reasoning and multistep planning to complete complex tasks in an autonomous way. To build these AI-enabled applications, AI engineers and agent developers require immediate access to models via reliable API endpoints running as high-performance workloads. Autonomous agents are resource intensive by design. They operate 24x7 at scale and can hit infrastructure dozens of times to resolve a single task, driving both sustained load and sharp spikes in compute demand. Without a dedicated foundation, performance slows and costs rise, and when combined with the security and governance demands of sensitive enterprise data, these pressures become serious barriers to production.Red Hat AI helps address these challenges by delivering a unified, metal-to-agent platform that simplifies the deployment of AI solutions. By providing a consistent framework for both builders and operators, Red Hat helps enable organizations to transition from being token consumers to token providers. This shift empowers enterprises to scale autonomous systems while maintaining the hardware efficiency and compute cost control required to turn AI experiments into production-ready assets.From static serving to precision orchestration: Driving down inference costsThe foundation of any AI-enabled application is the inference engine. To build effective agents, developers need low latency and high throughput to support chain of thought reasoning. Red Hat AI 3.4 introduces tools to provide this performance while maintaining economic sustainability.Model-as-a-Service (MaaS) for the enterprise: In this new release, MaaS provides platform engineers with a user interface [general availability, GA] to enable self-service token key management for role based administration [GA], usage tracking and showback [technical preview, TP], and enforce security standards while using both self-hosted [GA] and cloud-based models [TP].Distributed inference with llm-d: This release makes distributed inference easier to operate and more cost-efficient at scale. Users deploying models through the user interface (UI) can now discover Gateways that are available in their namespace and select one or more for their deployment — removing the dependency on a single cluster-wide default [TP]. A built-in YAML editor lets users inspect and edit the underlying resources [TP]. Request prioritization [TP] lets llm-d distinguish between interactive and background traffic on the same endpoint, processing latency-sensitive requests first and shedding lower-priority work under saturation. Autoscaling [TP] adjusts replicas automatically based on active request count, queue depth, and GPU utilization. OpenAI-compatible Batch Inference [developer preview, DP] adds a persistent, fire-and-forget path for high-volume workloads like document classification or log analysis.Speculative decoding for performance [GA]: The Red Hat AI platform integrates vLLM inference server, which now includes support for speculative decoding. By using highly efficient draft models to accelerate processing, this technique can increase response speeds by 2x–3x without quality loss, directly lowering the cost per interaction.Hardware flexibility across GPUs, CPUs, and NPUs: Red Hat AI 3.4 expands accelerator choice for enterprise inference with new AMD support across both GPUs and CPUs, including AMD Instinct MI355X GPU support, preview support for AMD Instinct MI350P PCIe, and generally available vLLM CPU serving on AMD EPYC processors. The release also includes general availability of vLLM CPU serving on Intel Xeon processors and a certified Rebellions container for ATOM NPU. This gives organizations more flexibility to match each workload to the right compute tier: GPUs for demanding reasoning workloads, CPUs for lightweight and always-on inference, and NPUs for power-efficient, high-throughput serving. Together, these capabilities help reduce cost per interaction, improve infrastructure utilization, and provide a consistent Red Hat AI experience across heterogeneous accelerator environments.Red Hat AI Inference, which provides enterprise support for vLLM and access to Red Hat validated and optimized models, now adds distributed inference capabilities with llm-d on both Red Hat OpenShift and third-party Kubernetes distributions [TP]. The initial release includes availability on CoreWeave and Azure’s managed Kubernetes services. Organizations can now run the same inference stack across environments without rearchitecting for each provider. This means AI operations remain consistent and use the same high-performant and open foundation regardless of the underlying hardware or cloud provider.Validating model integrity through evaluation-driven developmentA model is only as effective as the data that grounds it. Red Hat AI 3.4 focuses on evaluation-driven development (EDD), replacing subjective testing with concrete data and benchmarks to verify that models and agents are fully ready for production.Experiment tracking with MLflow [GA]: MLflow integration serves as the backbone to automatically log metrics, parameters, and artifacts to enable reproducibility and make it easy to compare results across both predictive and generative workloads. This includes prompt management, which treats prompts as versioned, governed corporate assets.Automated experiences [TP]: Tools like AutoRAG and AutoML automate complex AI tasks to reduce expensive guesswork and manual trial-and-error. AutoRAG automates the selection of embedding models and chunking strategies for retrieval-augmented generation (RAG), helping teams go from raw data to a high-performing pipeline much more quickly. Similarly, AutoML handles feature engineering and model selection for predictive analytics, freeing developers to focus on business outcomes rather than data prep.Eval hub [TP]: Red Hat AI 3.4 introduces eval hub, a framework-agnostic unified AI evaluation control plane for evaluating large language models (LLMs), AI applications, and agents. It replaces fragmented testing methods using a unified REST API and Kubernetes controller by offering curated and custom evaluation collections, a dashboard with embedded MLflow, and command line interface (CLI) and software development kit (SDK) access. By utilizing Open Container Initiative (OCI) model cards for governance and a Model Context Protocol (MCP) server for agent-discoverable evaluations, it provides a cluster-native environment for practitioners to scale reproducible benchmarking from laptops to production pipelines. De-risking the agentic enterprise: Maturity and traceabilityAutonomous agents require high levels of visibility, traceability, and governed access to tools so they remain within prescribed operational boundaries. Red Hat AI provides the AgentOps framework so these systems are observable and protected.Governed prompt management [TP]: The MLflow integration also powers new prompt management capabilities within the gen AI studio playground, a centralized environment where developers can prototype prompts, compare models, and check security without jumping between multiple tools. This allows developers to version, test, and refine agent prompts as governed assets. Managing prompts as code helps organizations accelerate time to value while maintaining consistency.Identity management [DP]: Red Hat AI implements SPIFFE/SPIRE for cryptographic agent identities, using short-lived tokens to eliminate hardcoded keys. This enables zero-trust security and allows agents to operate under least-privilege principles in production environments.Lifecycle management with Kagenti [DP]: For enterprises managing evolving agentic assets, the platform introduces Kagenti, a lifecycle management tool that allows teams to deploy, scale, and govern agents without changing the underlying code. Kagenti allows for the discovery and onboarding of agents across their lifecycle, supporting the transition from developer to production.Agent traceability via MLflow [GA]: MLflow provides end to end agent traceability. The system traces every LLM call, every tool execution, and every decision step. This is a fundamental requirement for debugging, auditing, and evaluating autonomous systems.Enterprise MCP management [DP/TP]: Red Hat AI introduces a platform approach for governing MCP-based tool access. The MCP catalog [DP] enables teams to discover and deploy trusted MCP servers from Red Hat and technology partners. The MCP lifecycle operator [DP] manages them as Kubernetes-native workloads. The MCP gateway [TP] provides centralized authentication, tool-level access control, and observability, so agents are only able to access authorized tools.Expanding the foundation: Safety and observabilityFor AI to be sustainable, it must run on a stable, transparent foundation. Red Hat AI 3.4 serves as a comprehensive operations hub, integrating MLOps, GenAIOps, and AgentOps into a single platform.Integrated authoring with prompt lab and registry [GA]: The platform provides unified tools for building and managing prompts, so the logic driving agentic behavior is stored in a central registry, providing a single source of truth for both developers and administrators.AI safety and red teaming [TP]: Red Hat AI 3.4 integrates automated adversarial scanning directly into the development lifecycle. Leveraging technology from the Chatterbox Labs acquisition, the platform utilizes Garak to screen models and agentic systems for risks like jailbreaks, prompt injections, and bias. This capability provides advanced risk analysis to catch security flaws in model logic during the development phase rather than at runtime. By identifying and mitigating vulnerabilities early, teams can evaluate the integrity of their AI applications to enable a safer transition to production deployment.Centralized metrics and observability [TP]: This release delivers a zero-configuration, unified Prometheus instance with native foundational dashboards. Cluster administrators can monitor hardware utilization and MaaS metrics [TP] from a single console. It also adds the ability to see an agent's step-by-step execution traces, reasoning chains, tool calls, and LLM interactions directly in the console [DP]. The platform retains the flexibility to route these metrics to existing third-party observability sinks.Red Hat AI on cloud marketplacesRed Hat AI Enterprise is coming soon for procurement directly through the AWS Marketplace, Microsoft Azure Marketplace, and Google Cloud Marketplace. This gives enterprise organizations a faster, more flexible path to deploying AI infrastructure on their preferred cloud. Organizations can now apply existing Enterprise Discount Programs (EDPs) and committed cloud spend toward Red Hat AI subscriptions, which simplifies the financial and procurement process.This availability represents an expansion of existing Red Hat AI cloud options. Red Hat already offers Red Hat Enterprise Linux AI on all 3 major marketplaces for organizations focused on running LLMs in Red Hat Enterprise Linux image mode. Red Hat AI Inference on IBM Cloud In combination with IBM Cloud, we are also announcing the availability of Red Hat AI Inference on IBM Cloud, a fully managed inference service that lets clients run production‑grade AI models. It offers fast, cost‑efficient access to foundation open source models with built-in governance such as enterprise‑grade access controls, auditing, and usage governance. Current model catalog examples include Granite 4.0 H Small (IBM), Mistral‑Small‑3.2‑24B‑Instruct, Llama 3.3 70B Instruct, and GPT‑OSS‑120B.Final thoughtsRed Hat AI 3.4 expands the functionality required to move from experimental chatbots to a fully realized agentic enterprise. By integrating distributed inference, automated data pipelines, framework-agnostic AgentOps, and proactive AI safety, Red Hat delivers a comprehensive foundation for the hybrid cloud. This release extends the tools to build autonomous systems that are predictable, security-focused, and economically sustainable in any environment. As a comprehensive platform for the agentic era, Red Hat AI helps organizations scale innovation while maintaining complete control over their AI assets.Learn more about Red Hat AI and discover how you can build AI for your world. Red Hat AI 3.4 is expected to be available later this month.

From inference to agents: Scaling AI in the enterprise with Red Hat AI 3.4

Other newsrooms on this story

Related reading

From metal to agent: Why agentic AI is an application evolution

Building Autonomous AI Agents in the Enterprise

The AI-enabled enterprise: Why we are applying software engineering principles…

Agentic AI on Red Hat OpenShift: What enterprises are doing right now

Enterprise AI pushes Red Hat’s open hybrid cloud strategy - SiliconANGLE

Red Hat adds support for agentic AI development