displayName
Prompt Regression Testing
workflowKind
development
triggerType
event-driven
typicalCadence
per-pull-request
complexity
single-team
description
Runs automated evaluation suites against LLM prompts and chains on every
change — comparing output quality, latency, cost, and safety metrics
against baseline snapshots and flagging regressions. Excludes prompt authoring.