This week's AI tooling news splits cleanly into two themes: local inference getting serious enough to displace cloud dependencies, and autonomous agents graduating from demos to production APIs. Throw in a supply chain security wake-up call and a terminal emulator worth switching for, and issue #22 is unusually dense with decisions worth making now.
Gemma 4 12B runs multimodal agents on laptops
Google's Gemma 4 12B drops the separate encoder architecture entirely — audio and vision inputs project directly into the LLM backbone. The result is a 16GB VRAM footprint that benchmarks against 26B-class models on reasoning tasks, with native audio support included at no extra memory cost.
The practical shift here is real. Multimodal agentic workflows have required either a cloud call or a beefy GPU server because you were running two or three model components in parallel. Gemma 4 collapses that into a single model load. Combined with first-class support for Ollama, LM Studio, llama.cpp, vLLM, and Hugging Face Transformers, you're looking at a model that fits the local dev stack most engineers already have.
Google also ships an official Skills Repository of agentic patterns, which matters more than it sounds — it means there's a canonical place to look before you roll your own tool-use scaffolding.






