iiRecord
Agentic AI Atlas · Terminal-Bench v1
test-set:terminal-bench-v1a5c.ai
II.
TestSet overview

test-set:terminal-bench-v1

Reference · live

Terminal-Bench v1 overview

Canonical Terminal-Bench v1 set referenced in the original paper and the public leaderboard.

TestSetOutgoing · 1Incoming · 0

Attributes

displayName
Terminal-Bench v1
benchmarkId
caseCount
80
releasedAt
2024-10-01
composition
The v1 release of Terminal-Bench from Stanford NLP / Princeton. Each task is a multi-step shell scenario evaluated end-to-end in a Docker sandbox; success requires the agent to reach a target file state via real shell commands.
homepageUrl
description
Canonical Terminal-Bench v1 set referenced in the original paper and the public leaderboard.

Outgoing edges

belongs_to_benchmark1

Incoming edges

None.