Agentic AI Atlas

II.

TestSet overview

test-set:terminal-bench-v1

Reference · live

Terminal-Bench v1 overview

Canonical Terminal-Bench v1 set referenced in the original paper and the public leaderboard.

TestSetOutgoing · 1Incoming · 0

Attributes

displayName

Terminal-Bench v1

benchmarkId

benchmark:terminal-bench

caseCount

releasedAt

2024-10-01

composition

The v1 release of Terminal-Bench from Stanford NLP / Princeton. Each task is a multi-step shell scenario evaluated end-to-end in a Docker sandbox; success requires the agent to reach a target file state via real shell commands.

homepageUrl

https://www.tbench.ai/

description

Canonical Terminal-Bench v1 set referenced in the original paper and the public leaderboard.

Outgoing edges

belongs_to_benchmark1

benchmark:terminal-bench·BenchmarkTerminal-Bench

Incoming edges

None.