Repo:github.com/hearth-project/hearth · Apache-2.0 · v0.1.0, alpha.
I've been building Hearth, a Kubernetes operator that serves open-source LLMs (Qwen, DeepSeek, GLM, …) declaratively and scales them to zero when idle. It's at a point where the core works end-to-end on real GPUs, and I'm looking for people to build it with me. The thing I most want you to know up front: you can contribute without owning an accelerator. More on that below.
## The one interesting problem
Self-hosting an LLM on K8s is easy until you notice the GPU is burning money while nobody's using the model. The obvious fix — "scale to zero" — runs straight into a chicken-and-egg problem: a stock HPA can't scale up from zero, because zero replicas means zero metrics, which means it never wakes up.
Hearth puts a small gateway (an OpenAI-compatible reverse proxy) in front of each model. When a request arrives at a scaled-to-zero backend, the gateway accepts it, holds the connection open (SSE keepalive heartbeats so nothing times out), and bumps a pending counter exposed at /hearth/queue. KEDA polls that endpoint, sees pending > 0, and scales the backend 0 → 1. The pod loads weights from a warm cache, becomes Ready, and the gateway forwards the buffered request and streams tokens back. Idle again → KEDA scales it back to 0.







