IBM has released Granite 4.0, their latest family of open-source small language models built for speed and low cost.
The Granite 4.0 models use a hybrid architecture that uses less memory than traditional models, so you can run them on regular consumer GPUs instead of expensive server hardware. They work well for document summarization, RAG systems, and AI agents.
ibm-granite/granite-4.0-h-small is a 30 billion parameter long-context instruct model and it’s now available on Replicate.
Running Granite 4.0 with an API
You can start using Granite models right away on Replicate. Here’s how to run them with an API:






