It's been a few months since I last wrote about Data Preprocessor, the IntelliJ plugin I built to stop re-writing the same pandas preprocessing scripts every project. The 1.5.x series has landed a real R codegen path, a more honest outlier-resistant normalizer, and one genuinely embarrassing deadlock that I want to talk about openly because the lesson is useful.
tl;dr on what the plugin does
You load a CSV, Excel, or JSON file inside your JetBrains IDE. The plugin profiles every column (type, null count, mean/median/std, mode, unique count). You build a pipeline visually — drop nulls, fill with mean, deduplicate, remove IQR outliers, normalize (min-max / z-score / robust), label-encode, one-hot, train/test split, sort, filter, type-cast — and then one click emits a complete, ready-to-run Python (pandas) or R (base + a few small libs) script.
All processing is local. The plugin collects no telemetry. The generated code is normal pandas or normal R — no runtime library, no plugin import, nothing magic. Read it, edit it, commit it alongside your dataset, run it long after you've uninstalled the plugin.
Here's roughly what a 5-step pipeline turns into:








