olmo-eval is an open evaluation workbench that helps model developers add, run, and analyze benchmarks across changing LLM checkpoints, extending OLMES from final-score reproducibility into the day-to-day model development loop.

A Blog post by Ai2 on Hugging Face

olmo-eval is an open evaluation workbench that helps model developers add, run, and analyze benchmarks across changing LLM checkpoints, extending OLMES from final-score…