torchvision is great for natural images. But remote sensing data is different:

GeoTIFFs, not PNGs — with coordinate reference systems baked in

Multi-spectral bands — beyond RGB into near-infrared, thermal, SAR

Massive sizes — a single satellite image can be 10,000×10,000 pixels

Spatial context matters — random cropping destroys geographic patterns