DBT Getting Started

DBT (Data Build Tool) transforms raw data in the bronze layer into analytics-ready dimensional models in the gold layer. This guide covers DBT installation, project setup, and your first transformation.

DBT Project Setup Steps

Step 1: Environment Setup

# Install DBT Core
pip install dbt-core

# Install ClickHouse adapter (for our data warehouse)
pip install dbt-clickhouse

# Verify installation
dbt --version

Step 2: Create New DBT Project

# Create new DBT project
dbt init adventureworks_analytics

# Navigate to project directory
cd adventureworks_analytics

Step 3: Configure Database Connection

Update profiles.yml file (usually located in ~/.dbt/profiles.yml):

adventureworks_analytics:
  target: dev
  outputs:
    dev:
      type: clickhouse
      host: localhost
      port: 8123
      user: admin
      password: clickhouse123
      database: analytics
      schema: gold_layer
      threads: 4
      keepalives_idle: 0
      search_path: bronze_layer,gold_layer
    prod:
      type: clickhouse
      host: "{{ env_var('CLICKHOUSE_HOST') }}"
      port: 8123
      user: "{{ env_var('CLICKHOUSE_USER') }}"
      password: "{{ env_var('CLICKHOUSE_PASSWORD') }}"
      database: analytics
      schema: gold_layer
      threads: 8

Step 4: Project Configuration

Update dbt_project.yml:

name: 'adventureworks_analytics'
version: '1.0.0'
config-version: 2

profile: 'adventureworks_analytics'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets:
  - "target"
  - "dbt_packages"

models:
  adventureworks_analytics:
    # Bronze layer (raw data staging)
    staging:
      +materialized: view
      +schema: bronze_layer
    
    # Gold layer (dimensional model)
    marts:
      +materialized: table
      +schema: gold_layer

Step 5: Define Data Sources

Create models/staging/sources.yml:

version: 2

sources:
  - name: bronze_layer
    description: "Raw data from LogStash ETL pipeline"
    tables:
      - name: sales_orders
        description: "Sales order header information"
        columns:
          - name: salesorderid
            description: "Primary key for sales order"
            tests:
              - unique
              - not_null

Step 6: Create Your First Model

Create models/staging/stg_sales_orders.sql:

{{ config(materialized='view') }}

SELECT
    salesorderid,
    customerid,
    territoryid,
    orderdate,
    duedate,
    shipdate,
    status,
    subtotal,
    taxamt,
    freight,
    totaldue,
    modifieddate
FROM {{ source('bronze_layer', 'sales_orders') }}
WHERE salesorderid IS NOT NULL

Step 7: Run DBT Pipeline

# Test connections
dbt debug

# Run staging models
dbt run --models staging

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

Next Steps

Continue to DBT Modeling Guide to build dimensional models.

Last updated

Was this helpful?