OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

OpenAI's LifeSciBench grades AI models on 750 expert-authored life-science tasks using rubrics; the strongest model passes only 36.1%.

giovedì 18 giugno 2026 New tab

960 words~4 min read

Most biology benchmarks ask narrow, fact-based questions with clean answers. Scientists weigh imperfect evidence and make decisions. OpenAI released LifeSciBench and it targets that gap directly.

Even the strongest model passes roughly one task in three. The benchmark is far from saturated.

What is LifeSciBench

LifeSciBench contains 750 expert-authored tasks. They span seven workflows and seven biological domains. Each task pairs a prompt, supporting artifacts, and a grading rubric.

The seven workflows cover evidence handling and analysis. They also include design and optimization, scientific reasoning, validation and operations, translation, and scientific communication.

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

Other newsrooms on this story

Related reading

OpenAI launches LifeSciBench to evaluate AI in life sciences

New benchmark exposes how badly AI struggles with real knowledge work

How to build a better AI benchmark

Fantastic Bugs and Where to Find Them in AI Benchmarks

AI's Finance Problem Is Quantified — And That's Bullish for the Builders

AstaBench update: New results, plus adoption from industry | Ai2

Related reading

OpenAI launches LifeSciBench to evaluate AI in life sciences

New benchmark exposes how badly AI struggles with real knowledge work

How to build a better AI benchmark

Fantastic Bugs and Where to Find Them in AI Benchmarks

AI's Finance Problem Is Quantified — And That's Bullish for the Builders

AstaBench update: New results, plus adoption from industry | Ai2

Other newsrooms on this story