displayName
ETL Pipeline Cost Optimization
workflowKind
governance
triggerType
scheduled
typicalCadence
monthly
complexity
cross-team
description
Optimizes compute costs and scheduling efficiency across ETL/ELT pipelines —
profiling per-pipeline resource consumption (CPU, memory, shuffle I/O)
against actual data volumes, identifying over-provisioned Spark/Flink
clusters and right-sizing executor configurations, consolidating overlapping
extraction windows to reduce source-system load, migrating infrequently-run
batch jobs to spot/preemptible instances, evaluating incremental versus
full-refresh strategies per table based on change-data-capture feasibility,
and tracking month-over-month cost trends with attribution to pipeline
owners. Produces cost attribution dashboards, optimization recommendation
reports, and scheduling conflict analyses. Excludes pipeline logic changes.