displayName
HarmBench
homepageUrl
https://www.harmbench.org/
kind
model-only
targetsKind
ModelVersion
description
HarmBench (Mazeika et al., CAIS 2024) is a standardized evaluation
framework for automated red-teaming of LLMs across harmful behaviors
(cyber, chemical/biological, misinformation, harassment, illegal),
pairing attack methods with target models on a fixed test bank.