IBM's Granite 4.0 is now on Replicate – Replicate blog

IBM has released Granite 4.0, their latest family of open-source small language models built for speed and low cost.

domenica 17 maggio 2026 New tab

461 words~2 min read

IBM has released Granite 4.0, their latest family of open-source small language models built for speed and low cost.

The Granite 4.0 models use a hybrid architecture that uses less memory than traditional models, so you can run them on regular consumer GPUs instead of expensive server hardware. They work well for document summarization, RAG systems, and AI agents.

ibm-granite/granite-4.0-h-small is a 30 billion parameter long-context instruct model and it’s now available on Replicate.

Running Granite 4.0 with an API

You can start using Granite models right away on Replicate. Here’s how to run them with an API:

IBM's Granite 4.0 is now on Replicate – Replicate blog

IBM's Granite 4.0 is now on Replicate – Replicate blog

Other newsrooms on this story

Related reading

Introducing the IBM Granite 4.1 family of models

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

Granite 4.1 LLMs: How They’re Built

Introducing Granite Libraries and Project Granite Switch

How to build AI more like software

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with…

Other newsrooms on this story

Related reading

Introducing the IBM Granite 4.1 family of models

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

Granite 4.1 LLMs: How They’re Built

Introducing Granite Libraries and Project Granite Switch

How to build AI more like software

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with…