From 65% to 87% accuracy on CIFAR-10 using Convolutional Neural Networks - and what went wrong along the way.

Introduction - Purpose

When building image classification models, most attention is typically given to model architecture, hyperparameters, or training strategies. Yet, the quality and preparation of input data can have an equally - if not more - significant impact on model performance. In practice, the same model trained on the same dataset can produce vastly different results depending solely on how the data is preprocessed.

This raised an important question: How much does data preprocessing actually influence model performance?

To answer this, I designed a set of controlled experiments where I systematically applied different preprocessing techniques, and the results were striking. With only changes in preprocessing and training strategy, the model's accuracy ranged from around 65% to over 87%, and in one case, dropped to nearly 20%. These observations challenged my initial assumptions and highlighted an often underestimated truth: preprocessing is not just a preliminary step, but a critical factor that can significantly shape model behavior.