displayName
GPQA
homepageUrl
https://github.com/idavidrein/gpqa
kind
model-only
targetsKind
ModelVersion
description
GPQA (Graduate-Level Google-Proof Q&A) by Rein et al. (2023) is a
448-question multiple-choice benchmark in biology, chemistry, and
physics written and validated by domain-expert PhDs. Designed to be
"Google-proof" — non-experts with web access score ~34%, in-domain
PhDs score ~65%. The Diamond subset (198 questions) is the hardest
tier and is the standard reported number in vendor announcements.