stack-profile:data-lakehouse
Data Lakehouse Stack (Databricks, Spark, Delta Lake, dbt, Airflow) overview
A unified analytics platform that combines the flexibility of a data lake with the ACID transaction guarantees and performance of a data warehouse. Databricks provides the managed Spark runtime and Delta Lake table format for schema evolution, time travel, and MERGE operations on cloud object storage. dbt handles the transformation layer, turning raw ingested tables into tested, documented analytical models. Airflow orchestrates the end-to-end pipeline from ingestion through transformation to consumption. Python serves as the glue language for custom operators, UDFs, and notebooks. Choose this stack when you need a single copy of data serving both BI dashboards and ML feature engineering, and when your data volumes exceed what a traditional warehouse handles cost-effectively.
Attributes
Outgoing edges
- domain:data-engineering·DomainData Engineering
- domain:data-science·DomainData Science
- tool:databricks·ToolDatabricks
- tool:airflow·ToolApache Airflow
- language:python·LanguagePython
- language:sql·LanguageSQL
- framework:spark-java·FrameworkSpark Java
- tool:prometheus·ToolPrometheus
- tool:docker·ToolDocker
- workflow:data-pipeline-deployment·WorkflowData Pipeline Deployment
- workflow:dbt-model-review·Workflowdbt Model Review
- skill-area:etl-pipelines·SkillAreaETL Pipelines
- skill-area:data-warehouse-modeling·SkillAreaData Warehouse Modeling
- skill-area:spark-jobs·SkillAreaApache Spark Jobs
- skill-area:dbt-modeling·SkillAreadbt Modeling
- skill-area:data-quality·SkillAreaData Quality
- role:data-engineer·RoleData Engineer
- role:analytics-engineer·RoleAnalytics Engineer
- role:data-scientist·RoleData Scientist