II.
Benchmark overview
Reference · livebenchmark:swe-bench-verified
SWE-bench Verified overview
Human-verified subset of SWE-bench (500 cases) with cleaned task statements and verified-solvable issues.
Attributes
displayName
SWE-bench Verified
homepageUrl
kind
full-stack
targetsKind
AgentVersion
description
Human-verified subset of SWE-bench (500 cases) with cleaned task
statements and verified-solvable issues.
Outgoing edges
covers1
- skill-area:bug-fixing-from-issues·SkillAreaBug Fixing from Issue Descriptions
refines1
- benchmark:swe-bench·BenchmarkSWE-bench
Incoming edges
belongs_to_benchmark1
- test-set:swe-bench-verified-2024-12·TestSetSWE-bench Verified 2024-12
bounds_subject1
- scope-boundary:swe-bench-verified.scope·ScopeBoundary
for_benchmark12
- eval-run:swe-bench-verified.claude-haiku-4-5.2025-10·EvalRun
- eval-run:swe-bench.deepseek-v3.2024-12·EvalRun
- eval-run:swe-bench-verified.gemini-2-5-flash.2025-06·EvalRun
- eval-run:swe-bench-verified.llama-4-405b.2024-07·EvalRun
- eval-run:swe-bench.llama-3-1-405b.2024-07·EvalRun
- eval-run:swe-bench-verified.claude-opus-4-5.2025-09·EvalRun
- eval-run:swe-bench-verified.claude-opus-4-7.2026-01·EvalRun
- eval-run:swe-bench-verified.o3.2025-04·EvalRun
- eval-run:swe-bench-verified.gemini-2-5-pro.2025-06·EvalRun
- eval-run:swe-bench.claude-code@1.x.2025-04-29·EvalRun
- eval-run:swe-bench-verified.claude-sonnet-4-5.2025-09·EvalRun
- eval-run:swe-bench-verified.gpt-5.2025-08·EvalRun
scored_against15
- eval-result:swe-bench-verified.claude-haiku-4-5.001·EvalResult
- eval-result:swe-bench.deepseek-v3.001·EvalResult
- eval-result:swe-bench-verified.gemini-2-5-flash.001·EvalResult
- eval-result:swe-bench-verified.llama-4-405b.001·EvalResult
- eval-result:swe-bench.llama-3-1-405b.001·EvalResult
- eval-result:swe-bench-verified.claude-opus-4-5.001·EvalResult
- eval-result:swe-bench-verified.claude-opus-4-7.001·EvalResult
- eval-result:swe-bench-verified.gpt-5.headline·EvalResult
- eval-result:swe-bench-verified.o3.001·EvalResult
- eval-result:swe-bench-verified.gemini-2-5-pro.001·EvalResult
- eval-result:swe-bench.claude-code.001·EvalResult
- eval-result:swe-bench-verified.claude-sonnet-4-5.high-compute.001·EvalResult
- eval-result:swe-bench-verified.claude-sonnet-4-5.001·EvalResult
- eval-result:swe-bench-verified.gpt-5.headline.001·EvalResult
- eval-result:swe-bench-verified.gpt-5.001·EvalResult