displayName
RE-Bench
homepageUrl
https://metr.org/AI_R_D_Evaluation_Report.pdf
kind
research-engineering
targetsKind
AgentVersion
description
METR's autonomous-research-engineering benchmark: time-bounded ML
R&D tasks scored against expert human baselines to measure
frontier-agent ability to do self-directed research engineering.