displayName
GPQA Diamond — 2024 release
benchmarkId
caseCount
198
releasedAt
2023-11-29
composition
The "diamond" subset of GPQA — 198 graduate-level questions in
biology, chemistry, and physics written and validated by domain
experts. Diamond is the hardest tier; in-domain experts (PhD
students) achieve ~65% accuracy, while non-expert humans with
web access score ~34%.
homepageUrl
https://github.com/idavidrein/gpqa
description
The frozen Diamond split is the standard reported number for
vendor announcements (Anthropic, OpenAI, Google) when citing
"GPQA Diamond" scores.