# DBT Getting Started

DBT (Data Build Tool) transforms raw data in the bronze layer into analytics-ready dimensional models in the gold layer. This guide covers DBT installation, project setup, and your first transformation.

## DBT Project Setup Steps

### Step 1: Environment Setup

```bash
# Install DBT Core
pip install dbt-core

# Install ClickHouse adapter (for our data warehouse)
pip install dbt-clickhouse

# Verify installation
dbt --version
```

### Step 2: Create New DBT Project

```bash
# Create new DBT project
dbt init adventureworks_analytics

# Navigate to project directory
cd adventureworks_analytics
```

### Step 3: Configure Database Connection

Update `profiles.yml` file (usually located in `~/.dbt/profiles.yml`):

```yaml
adventureworks_analytics:
  target: dev
  outputs:
    dev:
      type: clickhouse
      host: localhost
      port: 8123
      user: admin
      password: clickhouse123
      database: analytics
      schema: gold_layer
      threads: 4
      keepalives_idle: 0
      search_path: bronze_layer,gold_layer
    prod:
      type: clickhouse
      host: "{{ env_var('CLICKHOUSE_HOST') }}"
      port: 8123
      user: "{{ env_var('CLICKHOUSE_USER') }}"
      password: "{{ env_var('CLICKHOUSE_PASSWORD') }}"
      database: analytics
      schema: gold_layer
      threads: 8
```

### Step 4: Project Configuration

Update `dbt_project.yml`:

```yaml
name: 'adventureworks_analytics'
version: '1.0.0'
config-version: 2

profile: 'adventureworks_analytics'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"
clean-targets:
  - "target"
  - "dbt_packages"

models:
  adventureworks_analytics:
    # Bronze layer (raw data staging)
    staging:
      +materialized: view
      +schema: bronze_layer
    
    # Gold layer (dimensional model)
    marts:
      +materialized: table
      +schema: gold_layer
```

### Step 5: Define Data Sources

Create `models/staging/sources.yml`:

```yaml
version: 2

sources:
  - name: bronze_layer
    description: "Raw data from LogStash ETL pipeline"
    tables:
      - name: sales_orders
        description: "Sales order header information"
        columns:
          - name: salesorderid
            description: "Primary key for sales order"
            tests:
              - unique
              - not_null
```

### Step 6: Create Your First Model

Create `models/staging/stg_sales_orders.sql`:

```sql
{{ config(materialized='view') }}

SELECT
    salesorderid,
    customerid,
    territoryid,
    orderdate,
    duedate,
    shipdate,
    status,
    subtotal,
    taxamt,
    freight,
    totaldue,
    modifieddate
FROM {{ source('bronze_layer', 'sales_orders') }}
WHERE salesorderid IS NOT NULL
```

### Step 7: Run DBT Pipeline

```bash
# Test connections
dbt debug

# Run staging models
dbt run --models staging

# Run tests
dbt test

# Generate documentation
dbt docs generate
dbt docs serve
```

## Next Steps

Continue to [DBT Modeling Guide](/data-platform/index/development/dbt-modeling-guide.md) to build dimensional models.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasuite.vn/data-platform/index/development/dbt-getting-started.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
