displayName
EvalPlus
homepageUrl
https://evalplus.github.io/
kind
code-functional-correctness
targetsKind
ModelVersion
description
EvalPlus extends HumanEval and MBPP with 80x more high-quality
tests per task to expose flaky correctness in LLM-generated code,
yielding HumanEval+ and MBPP+ leaderboards.