II.
TestSet overview
Reference · livetest-set:gsm8k-test
GSM8K test split overview
Canonical GSM8K test split used in nearly every published reasoning eval since 2022.
Attributes
displayName
GSM8K test split
benchmarkId
caseCount
1319
releasedAt
2021-10-27
composition
The held-out test split of GSM8K — 1,319 grade-school math word
problems requiring 2-8 reasoning steps. Standard split published
alongside the OpenAI GSM8K release.
homepageUrl
description
Canonical GSM8K test split used in nearly every published reasoning
eval since 2022.
Outgoing edges
belongs_to_benchmark1
- benchmark:gsm8k·BenchmarkGSM8K
Incoming edges
None.