Every time a PyTorch model refuses to learn, the debugging process looks the same:

Stare at the loss curve

Wonder if gradients are flowing

Add print statements everywhere

Delete them all when it works