II.
SkillArea overview
Reference · liveskill-area:safety-redteaming
Safety Red-Teaming overview
Adversarial probing of model and agent safety — eliciting policy violations, jailbreak resistance, and harmful-content refusals under structured attack taxonomies. Covers the HarmBench class of safety evaluations.
Attributes
displayName
Safety Red-Teaming
description
Adversarial probing of model and agent safety — eliciting policy
violations, jailbreak resistance, and harmful-content refusals under
structured attack taxonomies. Covers the HarmBench class of safety
evaluations.
domains
expertiseLevels
- intermediate
- expert
- authoritative
Outgoing edges
applies_to1
- domain:security·DomainSecurity
Incoming edges
covers4
- benchmark:bias-bench·BenchmarkBBQ (Bias Benchmark for QA)
- benchmark:harmbench·BenchmarkHarmBench
- benchmark:jailbreakbench·BenchmarkJailbreakBench
- benchmark:advbench·BenchmarkAdvBench
lib_requires_skill_area8
- lib-agent:ai-agents-conversational--bias-fairness-analyst·LibraryAgentbias-fairness-analyst
- lib-agent:ai-agents-conversational--prompt-injection-defender·LibraryAgentprompt-injection-defender
- lib-agent:ai-agents-conversational--safety-auditor·LibraryAgentsafety-auditor
- lib-skill:ai-agents-conversational--constitutional-ai-prompts·LibrarySkillconstitutional-ai-prompts
- lib-skill:ai-agents-conversational--content-moderation-api·LibrarySkillcontent-moderation-api
- lib-skill:ai-agents-conversational--nemo-guardrails·LibrarySkillnemo-guardrails
- lib-skill:ai-agents-conversational--prompt-injection-detector·LibrarySkillprompt-injection-detector
- lib-skill:security-research--aiml-security·LibrarySkillaiml-security
prerequisite_for_learning2
- skill-area:AI-safety-alignment·SkillAreaAI Safety & Alignment
- skill-area:ai-agent-development·SkillAreaAI Agent Development
requires_skill_area1
- stack-profile:ai-safety-guardrails·StackProfileAI Safety / Guardrails Stack (Python, OPA, FastAPI, Redis, Prometheus)