Most biology benchmarks ask narrow, fact-based questions with clean answers. Scientists weigh imperfect evidence and make decisions. OpenAI released LifeSciBench and it targets that gap directly.
Even the strongest model passes roughly one task in three. The benchmark is far from saturated.
What is LifeSciBench
LifeSciBench contains 750 expert-authored tasks. They span seven workflows and seven biological domains. Each task pairs a prompt, supporting artifacts, and a grading rubric.
The seven workflows cover evidence handling and analysis. They also include design and optimization, scientific reasoning, validation and operations, translation, and scientific communication.









