displayName
AdvBench
homepageUrl
https://github.com/llm-attacks/llm-attacks
kind
model-only
targetsKind
ModelVersion
description
AdvBench (Zou et al., "Universal and Transferable Adversarial
Attacks on Aligned Language Models", 2023) is a 520-string harmful-
behavior corpus paired with a standard suffix-attack protocol,
widely used to measure jailbreak robustness.