DBT Getting Started
DBT (Data Build Tool) transforms raw data in the bronze layer into analytics-ready dimensional models in the gold layer. This guide covers DBT installation, project setup, and your first transformation.
DBT Project Setup Steps
Step 1: Environment Setup
# Install DBT Core
pip install dbt-core
# Install ClickHouse adapter (for our data warehouse)
pip install dbt-clickhouse
# Verify installation
dbt --version
Step 2: Create New DBT Project
# Create new DBT project
dbt init adventureworks_analytics
# Navigate to project directory
cd adventureworks_analytics
Step 3: Configure Database Connection
Update profiles.yml
file (usually located in ~/.dbt/profiles.yml
):
adventureworks_analytics:
target: dev
outputs:
dev:
type: clickhouse
host: localhost
port: 8123
user: admin
password: clickhouse123
database: analytics
schema: gold_layer
threads: 4
keepalives_idle: 0
search_path: bronze_layer,gold_layer
prod:
type: clickhouse
host: "{{ env_var('CLICKHOUSE_HOST') }}"
port: 8123
user: "{{ env_var('CLICKHOUSE_USER') }}"
password: "{{ env_var('CLICKHOUSE_PASSWORD') }}"
database: analytics
schema: gold_layer
threads: 8
Step 4: Project Configuration
Update dbt_project.yml
:
name: 'adventureworks_analytics'
version: '1.0.0'
config-version: 2
profile: 'adventureworks_analytics'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target"
clean-targets:
- "target"
- "dbt_packages"
models:
adventureworks_analytics:
# Bronze layer (raw data staging)
staging:
+materialized: view
+schema: bronze_layer
# Gold layer (dimensional model)
marts:
+materialized: table
+schema: gold_layer
Step 5: Define Data Sources
Create models/staging/sources.yml
:
version: 2
sources:
- name: bronze_layer
description: "Raw data from LogStash ETL pipeline"
tables:
- name: sales_orders
description: "Sales order header information"
columns:
- name: salesorderid
description: "Primary key for sales order"
tests:
- unique
- not_null
Step 6: Create Your First Model
Create models/staging/stg_sales_orders.sql
:
{{ config(materialized='view') }}
SELECT
salesorderid,
customerid,
territoryid,
orderdate,
duedate,
shipdate,
status,
subtotal,
taxamt,
freight,
totaldue,
modifieddate
FROM {{ source('bronze_layer', 'sales_orders') }}
WHERE salesorderid IS NOT NULL
Step 7: Run DBT Pipeline
# Test connections
dbt debug
# Run staging models
dbt run --models staging
# Run tests
dbt test
# Generate documentation
dbt docs generate
dbt docs serve
Next Steps
Continue to DBT Modeling Guide to build dimensional models.
Last updated
Was this helpful?