There is a number that haunts every fraud detection engineer: 0.13%.

That is the fraud rate in the PaySim dataset — 8,213 fraudulent transactions buried inside 6,362,620 legitimate ones. It sounds small. It is not. At that ratio, a model that predicts "legitimate" for every single transaction achieves 99.87% accuracy — and catches exactly zero fraud.

This is the problem I set out to solve with TrustGuard AI, a course project that turned into one of the most technically demanding things I have built. By the end of it, our deployed XGBoost model achieves AUC-ROC of 0.9995 and Recall of 0.9976 — meaning it catches 99.76% of all fraud on a 6.3 million row test set. It also explains every single prediction using SHAP, and grounds each fraud alert in real State Bank of Pakistan regulatory documents through a RAG pipeline.

This article is the full story — what worked, what broke, and why accuracy is the wrong metric for fraud detection.

The Problem With Accuracy