Tabby v0.1.1: Metal inference and StarCoder supports! | Tabby AI coding assistant

We are thrilled to announce the release of Tabby v0.1.1 👏🏻.Apple M1/M2 Tabby users can now harness Metal inference support on Apple's M1 and M2 chips by using the --device metal flag, thanks to llama.cpp's awesome metal support.The Tabby team made a contribution by adding support for the StarCoder series models (1B/3B/7B) in llama.cpp, enabling more appropriate model usage on the edge for completion use cases.llama_print_timings: load time = 105.15 ms

martedì 19 maggio 2026 New tab

llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)

llama_print_timings: prompt eval time = 25.07 ms / 6 tokens ( 4.18 ms per token, 239.36 tokens per second)

llama_print_timings: eval time = 311.80 ms / 28 runs ( 11.14 ms per token, 89.80 tokens per second)

llama_print_timings: total time = 340.25 ms‍Inference benchmarking with StarCoder-1B on Apple M2 Max now takes approximately 340ms, compared to the previous time of around 1790ms. This represents a roughly 5x speed improvement.This enhancement leads to a significant inference speed upgrade🚀, for example, It marks a meaningful milestone in Tabby's adoption on Apple devices. Check out our Model Directory to discover LLM models with Metal support! 🎁tipCheck out latest Tabby updates on Linkedin and Slack community! Our Tabby community is eager for your participation. ❤️Stay Updated with Tabby NewsSubscribe to our newsletter for the latest updates and news about Tabby.Thank you! We've received your submission.Oops! Something went wrong. Please try again.

llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)

llama_print_timings: prompt eval time = 25.07 ms / 6 tokens ( 4.18 ms per token, 239.36 tokens per second)

llama_print_timings: eval time = 311.80 ms / 28 runs ( 11.14 ms per token, 89.80 tokens per second)

Tabby v0.1.1: Metal inference and StarCoder supports! | Tabby AI coding assistant

Tabby v0.1.1: Metal inference and StarCoder supports! | Tabby AI coding assistant

Related reading

Announcing our $3.2M seed round, and the long-awaited RAG release in Tabby…

Introducing First Stable Release: v0.0.1 | Tabby AI coding assistant

Running Tabby Locally with AMD ROCm | Tabby AI coding assistant

Deploying a Tabby Instance in Hugging Face Spaces | Tabby AI coding assistant

Cracking the Coding Evaluation | Tabby AI coding assistant

Vulkan Support: LLMs for Everyone | Tabby AI coding assistant

Related reading

Announcing our $3.2M seed round, and the long-awaited RAG release in Tabby…

Introducing First Stable Release: v0.0.1 | Tabby AI coding assistant

Running Tabby Locally with AMD ROCm | Tabby AI coding assistant

Deploying a Tabby Instance in Hugging Face Spaces | Tabby AI coding assistant

Cracking the Coding Evaluation | Tabby AI coding assistant

Vulkan Support: LLMs for Everyone | Tabby AI coding assistant