Agentic AI Atlas

II.

Benchmark overview

benchmark:harmbench

Reference · live

HarmBench overview

HarmBench (Mazeika et al., CAIS 2024) is a standardized evaluation framework for automated red-teaming of LLMs across harmful behaviors (cyber, chemical/biological, misinformation, harassment, illegal), pairing attack methods with target models on a fixed test bank.

BenchmarkOutgoing · 3Incoming · 2

Attributes

displayName

HarmBench

homepageUrl

https://www.harmbench.org/

kind

model-only

targetsKind

ModelVersion

description

Outgoing edges

covers1

skill-area:safety-redteaming·SkillAreaSafety Red-Teaming

evaluates_policy2

content-policy:default-acceptable-use·ContentPolicyDefault acceptable-use policy
content-policy:eu-ai-act-aligned·ContentPolicyEU AI Act-aligned content policy

Incoming edges

for_benchmark1

eval-run:harmbench.claude-opus-4-5.2025-09·EvalRun

scored_against1

eval-result:harmbench.claude-opus-4-5.001·EvalResult