benchmark:human-eval
Benchmark
skill-area:python-implemen…
SkillArea
eval-run:human-eval.qwen-2…
EvalRun
eval-run:human-eval.qwen-2…
EvalRun
eval-run:human-eval.claude…
EvalRun
eval-run:human-eval.deepse…
EvalRun
eval-run:human-eval.llama-…
EvalRun
eval-run:human-eval.llama-…
EvalRun
eval-run:human-eval.llama-…
EvalRun
eval-run:human-eval.mistra…
EvalRun
eval-run:human-eval.codest…
EvalRun
eval-run:human-eval.gpt-5.…
EvalRun
scope-boundary:human-eval.…
ScopeBoundary