OpenAI launches LifeSciBench to evaluate AI in life sciences

OpenAI just dropped what amounts to a final exam for AI models trying to do real science. LifeSciBench, published on June 17, is a benchmarking tool built to measure how well AI systems handle actual life sciences research, not the sanitized textbook version, but the messy, multi-step, figure-laden work that PhD scientists do every day.

The benchmark includes 750 tasks spanning seven distinct research workflows, from evidence handling and analysis to experimental design, scientific reasoning, and communication.

What makes LifeSciBench different

The 750 tasks were authored and reviewed by 173 PhD-level scientists with backgrounds in biotechnology and pharmaceuticals. An additional 453 expert reviewers helped validate them. Each task averaged six automated review cycles, and expert consensus required at least 90% agreement before a task made it into the final set.

The tasks come loaded with 1,062 attached artifacts, including figures, PDFs, and datasets. That matters because real research doesn’t happen in clean text boxes. It happens in spreadsheets with missing columns, in blurry gel images, in 40-page supplementary files that nobody wants to read. LifeSciBench forces AI models to deal with all of it.

The benchmark includes 750 tasks spanning seven distinct research workflows, from evidence handling and analysis to experimental design, scientific reasoning, and communication.

What makes LifeSciBench different

OpenAI launches LifeSciBench to evaluate AI in life sciences

OpenAI launches LifeSciBench to evaluate AI in life sciences

Other newsrooms on this story

Related reading

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real…

How to build a better AI benchmark

AstaBench update: New results, plus adoption from industry | Ai2

New benchmark exposes how badly AI struggles with real knowledge work

Open-world evaluations for measuring frontier AI capabilities

AI models are getting very good at professional tasks, new OpenAI research…

Related reading

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real…

How to build a better AI benchmark

AstaBench update: New results, plus adoption from industry | Ai2

New benchmark exposes how badly AI struggles with real knowledge work

Open-world evaluations for measuring frontier AI capabilities

AI models are getting very good at professional tasks, new OpenAI research…

Other newsrooms on this story