A complete walkthrough of speech transcription, LLM inference, tokenization, and 4-bit quantization. Built with Whisper, Llama 3.2, and the HuggingFace ecosystem.
Skill level: Intermediate | Runtime: Google Colab T4 GPU | Models: Whisper-medium, Llama-3.2-3B
Table of Contents
The Two-Step Pipeline
Tokenization








