displayName
AI Inference Cost Review
workflowKind
governance
triggerType
scheduled
typicalCadence
bi-weekly
complexity
cross-team
description
Reviews AI and LLM inference costs across the organization to optimize
spend while maintaining quality -- analyzing API cost breakdowns by model,
feature, and team with token-level granularity, evaluating prompt
engineering efficiency by measuring token counts against output quality
metrics, reviewing caching layer effectiveness including semantic cache hit
rates and cost avoidance, assessing model selection appropriateness by
comparing quality-to-cost ratios across model tiers for each use case,
identifying opportunities to shift workloads from expensive frontier models
to fine-tuned smaller models, tracking cost trends against usage growth to
detect non-linear cost scaling, reviewing batch vs real-time inference
allocation for latency-tolerant workloads, and benchmarking per-request
costs against industry norms. Produces AI cost dashboard, optimization
recommendation report, and model-tier allocation review. Excludes model
training and fine-tuning.