OpenAI launched LifeSciBench with 750 expert-authored tasks and 19,020 evaluation criteria to benchmark AI models like GPT-Rosalind on real life sciences

OpenAI launched LifeSciBench with 750 expert-authored tasks and 19,020 evaluation criteria to benchmark AI models like GPT-Rosalind on real life sciences

OpenAI's LifeSciBench grades AI models on 750 expert-authored life-science tasks using rubrics; the strongest model passes only 36.1%.