LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025)
LMR-BENCH (EMNLP 2025) benchmarks LLM agents on reproducing code from 23 NLP papers. This PoC explains the masking methodology, evaluation axes, and what the results mean for AI-assisted research.