magazine.sebastianraschka.com — Warptech Lab News

magazine.sebastianraschka.com

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed…

From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs

magazine.sebastianraschka.com4 g fa

magazine.sebastianraschka.com

My Workflow for Understanding LLM Architectures

A learning-oriented workflow for understanding new open-weight model releases

magazine.sebastianraschka.com1 mesi fa

magazine.sebastianraschka.com

Components of A Coding Agent

How coding agents use tools, memory, and repo context to make LLMs work better in practice

magazine.sebastianraschka.com1 mesi fa

magazine.sebastianraschka.com

A Visual Guide to Attention Variants in Modern LLMs

From MHA and GQA to MLA, sparse attention, and hybrid architectures

magazine.sebastianraschka.com1 mesi fa

magazine.sebastianraschka.com

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026

A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026

magazine.sebastianraschka.com2 mesi fa

magazine.sebastianraschka.com

Categories of Inference-Time Scaling for Improved LLM Reasoning

And an Overview of Recent Inference-Scaling Papers

magazine.sebastianraschka.com3 mesi fa

magazine.sebastianraschka.com

The State Of LLMs 2025: Progress, Progress, and Predictions

A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and…

magazine.sebastianraschka.com4 mesi fa

magazine.sebastianraschka.com

AI science

LLM Research Papers: The 2025 List (July to December)

In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this…

magazine.sebastianraschka.com4 mesi fa

magazine.sebastianraschka.com

A Technical Tour of the DeepSeek Models from V3 to V3.2

Understanding How DeepSeek's Flagship Open-Weight Models Evolved

magazine.sebastianraschka.com5 mesi fa

magazine.sebastianraschka.com

Beyond Standard LLMs

Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers

magazine.sebastianraschka.com6 mesi fa

magazine.sebastianraschka.com

Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples

magazine.sebastianraschka.com7 mesi fa

magazine.sebastianraschka.com

Understanding and Implementing Qwen3 From Scratch

A Detailed Look at One of the Leading Open-Source LLMs

magazine.sebastianraschka.com8 mesi fa

magazine.sebastianraschka.com

From GPT-2 to gpt-oss: Analyzing the Architectural Advances

And How They Stack Up Against Qwen3

magazine.sebastianraschka.com9 mesi fa

magazine.sebastianraschka.com

The Big LLM Architecture Comparison

From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design

magazine.sebastianraschka.com10 mesi fa

magazine.sebastianraschka.com

AI science

LLM Research Papers: The 2025 List (January to June)

A topic-organized collection of 200+ LLM research papers from 2025

magazine.sebastianraschka.com10 mesi fa

magazine.sebastianraschka.com

Understanding and Coding the KV Cache in LLMs from Scratch

KV caches are one of the most critical techniques for efficient inference in LLMs in production.

magazine.sebastianraschka.com11 mesi fa