"Accuracy lied to you. Here's the complete toolkit—confusion matrix, precision, recall, F1, ROC/AUC, log loss, and cross-validation—that separates models that look good from models that actually work."

You trained your first classifier, ran .score(), and got 97% accuracy. You shipped it. Three weeks later, your fraud team tells you it's catching zero fraudulent transactions.

Sound familiar? You fell into the accuracy trap—and it's the most common mistake from developers moving into ML.

This guide will give you the mental model and the code to evaluate binary classifiers properly. By the end, you'll know which metrics to reach for, when accuracy actively lies to you, how to read a ROC curve, and the seven pitfalls that silently kill production models.

Why Linear Regression Breaks for Classification