II.
LibraryProcess overview
Reference · livelib-process:data-engineering-analytics--etl-elt-pipeline
etl-elt-pipeline overview
ETL/ELT Pipeline Setup - Design and implement a comprehensive data pipeline from source to destination, including source connection, ingestion layer, transformation logic, data quality gates, orchestration, and monitoring. Supports both batch and streaming data patterns with comprehensive validation and error handling.
Attributes
displayName
etl-elt-pipeline
description
ETL/ELT Pipeline Setup - Design and implement a comprehensive data pipeline from source to destination,
including source connection, ingestion layer, transformation logic, data quality gates, orchestration, and monitoring.
Supports both batch and streaming data patterns with comprehensive validation and error handling.
libraryPath
library/specializations/data-engineering-analytics/etl-elt-pipeline.js
specialization
data-engineering-analytics
references
- - ETL Best Practices: https://docs.airbyte.com/understanding-airbyte/ - Data Quality: https://www.datakitchen.io/data-quality-fundamentals - Apache Airflow: https://airflow.apache.org/docs/ - Dagster: https://docs.dagster.io/ - dbt (data build tool): https://docs.getdbt.com/ - Great Expectations: https://greatexpectations.io/ - Data Engineering Best Practices: https://github.com/DataTalksClub/data-engineering-zoomcamp
example
const result = await orchestrate('specializations/data-engineering-analytics/etl-elt-pipeline', {
pipelineName: 'Customer Analytics Pipeline',
sources: [
{ type: 'postgresql', name: 'transactional-db', connection: 'prod-db' },
{ type: 'kafka', name: 'events-stream', topics: ['user-events', 'product-views'] },
{ type: 's3', name: 'logs-bucket', path: 's3://logs/clickstream/' }
],
destinations: [
{ type: 'snowflake', name: 'analytics-warehouse', schema: 'customer_analytics' },
{ type: 'redshift', name: 'reporting-db', schema: 'reports' }
],
pipelineType: 'hybrid', // 'batch', 'streaming', or 'hybrid'
transformationLogic: {
stages: ['extract', 'clean', 'enrich', 'aggregate', 'load'],
dbtModels: true,
customTransformations: ['user-segmentation', 'revenue-attribution']
},
dataQualityRules: {
schemaValidation: true,
nullChecks: true,
rangeChecks: true,
referentialIntegrity: true,
customRules: ['email-format', 'positive-revenue']
},
orchestration: {
tool: 'airflow', // 'airflow', 'dagster', 'prefect', 'step-functions'
schedule: 'hourly',
retryPolicy: { maxRetries: 3, backoff: 'exponential' }
}
});
usesAgents
- data-architect
- data-integration-engineer
- data-warehouse-engineer
- ingestion-engineer
- data-platform-engineer
- analytics-engineer
- data-quality-engineer
- pipeline-quality-engineer
- reliability-engineer
- orchestration-engineer
- data-optimization-engineer
- observability-engineer
- metadata-engineer
- performance-engineer
- security-engineer
- qa-engineer
- technical-writer
- validation-engineer
- deployment-engineer
- data-engineering-lead
Outgoing edges
lib_applies_to_domain1
- domain:data-engineering·DomainData Engineering
lib_belongs_to_specialization1
- specialization:data-engineering-analytics·Specialization
lib_implements_workflow2
- workflow:data-pipeline-deployment·WorkflowData Pipeline Deployment
- workflow:data-backfill-procedure·WorkflowData Backfill Procedure
uses_agent9
- lib-agent:software-architecture--data-architect·LibraryAgentdata-architect
- lib-agent:game-development--analytics-engineer·LibraryAgentanalytics-engineer
- lib-agent:data-engineering-analytics--data-quality-engineer·LibraryAgentdata-quality-engineer
- lib-agent:electrical-engineering--reliability-engineer·LibraryAgentreliability-engineer
- lib-agent:ai-agents-conversational--observability-engineer·LibraryAgentobservability-engineer
- lib-agent:software-architecture--performance-engineer·LibraryAgentperformance-engineer
- lib-agent:shared--qa-engineer·LibraryAgentqa-engineer
- lib-agent:meta--technical-writer·LibraryAgenttechnical-writer
- lib-agent:shared--deployment-engineer·LibraryAgentdeployment-engineer
Incoming edges
None.