# Quick Start (Docker Compose)

This is the recommended setup method for getting started quickly with DataSuite ETL. Docker Compose orchestrates all services with a single command.

## Why Choose Docker Compose?

**✅ Advantages:**

* **Fast Setup**: Single command to start entire stack
* **Simplified Networking**: Automatic service discovery and communication
* **Easy Management**: Start, stop, and restart all services together
* **Consistent Environment**: Same configuration across different machines
* **Beginner Friendly**: Minimal configuration required

**❌ Limitations:**

* Less control over individual service configuration
* All services use default settings
* Limited customization options

## Step 1: Download MySQL JDBC Driver

First, create the directory structure and download the required MySQL connector:

```bash
# Create directory for LogStash drivers
mkdir -p logstash-drivers

# Download MySQL JDBC driver using Docker container
docker run --rm -v $(pwd)/logstash-drivers:/drivers alpine:latest \
  sh -c "apk add --no-cache wget && \
         wget -O /drivers/mysql-connector-java.jar \
         https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.9-rc/mysql-connector-java-8.0.9-rc.jar"
```

**Expected Output:**

```bash
Connecting to repo1.maven.org...
HTTP request sent, awaiting response... 200 OK
Length: 2295712 (2.2M) [application/java-archive]
mysql-connector-java.jar saved [2295712/2295712]
```

## Step 2: Create Docker Compose Configuration

Create `docker-compose.yml` in your project directory:

```yaml
version: '3.8'

services:
  mysql:
    image: mysql:8.0
    container_name: mysql
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: adventureworks  
    ports:
      - "3306:3306"
    volumes:
      - mysql-data:/var/lib/mysql
    networks:
      - datasuite-network

  clickhouse:
    image: clickhouse/clickhouse-server:latest
    container_name: clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
    environment:
      CLICKHOUSE_USER: admin
      CLICKHOUSE_PASSWORD: clickhouse123
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    volumes:
      - clickhouse-data:/var/lib/clickhouse
    networks:
      - datasuite-network

  logstash:
    image: docker.elastic.co/logstash/logstash:8.11.0
    container_name: logstash
    ports:
      - "5044:5044"
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
      - ./logstash-drivers:/usr/share/logstash/drivers
    depends_on:
      - mysql
      - clickhouse
    networks: 
      - datasuite-network

  airflow:
    image: apache/airflow:2.7.0
    container_name: airflow
    ports:
      - "8080:8080"
    volumes:
      - ./dags:/opt/airflow/dags
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
    depends_on:
      - clickhouse
    networks:
      - datasuite-network

volumes:
  mysql-data:
  clickhouse-data:

networks:
  datasuite-network:
    driver: bridge
```

## Step 3: Create LogStash Configuration

Create `logstash.conf` in your project directory:

```bash
input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/drivers/mysql-connector-java.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql:3306/adventureworks"
    jdbc_user => "root"
    jdbc_password => "password"
    statement => "SELECT * FROM Sales.SalesOrderHeader WHERE ModifiedDate > :sql_last_value ORDER BY ModifiedDate"
    use_column_value => true
    tracking_column => "ModifiedDate"
    tracking_column_type => "timestamp"
    last_run_metadata_path => "/usr/share/logstash/.logstash_jdbc_last_run"
    clean_run => false
    schedule => "*/30 * * * * *"
  }
}

filter {
  # Convert to proper JSON format for ClickHouse
  mutate {
    remove_field => ["@version", "@timestamp"]
  }
}

output {
  http {
    url => "http://clickhouse:8123/"
    http_method => "post"
    format => "message"
    message => "INSERT INTO bronze_layer.sales_orders FORMAT JSONEachRow %{message}"
    headers => {
      "Authorization" => "Basic YWRtaW46Y2xpY2tob3VzZTEyMw=="  # admin:clickhouse123 base64
      "Content-Type" => "application/json"
    }
  }
  
  # Debug output to console
  stdout { 
    codec => rubydebug 
  }
}
```

## Step 4: Start All Services

Launch the entire stack with a single command:

```bash
# Start all services in detached mode
docker-compose up -d
```

**Expected Output:**

```bash
Creating network "datasuite-etl_datasuite-network" with driver "bridge"
Creating volume "datasuite-etl_mysql-data" with default driver
Creating volume "datasuite-etl_clickhouse-data" with default driver
Creating mysql ... done
Creating clickhouse ... done
Creating logstash ... done
Creating airflow ... done
```

## Step 5: Load Sample Data

Once MySQL is running, load the AdventureWorks sample data:

```bash
# Wait for MySQL to be ready (may take 30-60 seconds)
docker-compose exec mysql mysqladmin ping -h localhost -u root -ppassword

# Load Adventure Works 2019 data
docker exec -i mysql mysql -uroot -ppassword adventureworks < docs/data/AdventureWorks2019.sql

# Verify data was loaded successfully
docker exec mysql mysql -uroot -ppassword adventureworks -e "SHOW TABLES;"
docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) as table_count FROM information_schema.tables WHERE table_schema='adventureworks';"
```

**Expected Output:**

```sql
+-------------------------+
| Tables_in_adventureworks|
+-------------------------+
| Sales_Customer          |
| Sales_SalesOrderDetail  |
| Sales_SalesOrderHeader  |
| Sales_SalesTerritory    |
| Production_Product      |
+-------------------------+

+--------------+
| table_count  |
+--------------+
|           5  |
+--------------+
```

## Step 6: Prepare ClickHouse

Create the required databases in ClickHouse:

```bash
# Create bronze layer database
curl -u admin:clickhouse123 http://localhost:8123/ -d "CREATE DATABASE IF NOT EXISTS bronze_layer"

# Create gold layer database  
curl -u admin:clickhouse123 http://localhost:8123/ -d "CREATE DATABASE IF NOT EXISTS gold_layer"

# Create sales_orders table in bronze layer
curl -u admin:clickhouse123 http://localhost:8123/ -d "
CREATE TABLE IF NOT EXISTS bronze_layer.sales_orders (
    salesorderid Int32,
    customerid Int32,
    territoryid Nullable(Int32),
    orderdate DateTime,
    duedate DateTime,
    shipdate Nullable(DateTime),
    status Int8,
    subtotal Decimal64(4),
    taxamt Decimal64(4),
    freight Decimal64(4),
    totaldue Decimal64(4),
    modifieddate DateTime
) ENGINE = MergeTree()
ORDER BY (orderdate, salesorderid)
"
```

## Step 7: Monitor Service Status

Check that all services are running correctly:

```bash
# View service status
docker-compose ps

# Expected output should show all services as "Up"
```

**Expected Output:**

```bash
    Name                   Command               State                    Ports                  
------------------------------------------------------------------------------------------------
airflow        /usr/bin/dumb-init -- /ent ...   Up      0.0.0.0:8080->8080/tcp,:::8080->8080/tcp
clickhouse     /entrypoint.sh                   Up      0.0.0.0:8123->8123/tcp,:::8123->8123/tcp, 
                                                        0.0.0.0:9000->9000/tcp,:::9000->9000/tcp
logstash       /usr/local/bin/docker-entr ...   Up      0.0.0.0:5044->5044/tcp,:::5044->5044/tcp
mysql          docker-entrypoint.sh mysqld      Up      0.0.0.0:3306->3306/tcp,:::3306->3306/tcp, 33060/tcp
```

## Step 8: Access Service UIs

Once everything is running, you can access the web interfaces:

### ClickHouse Web Interface

* **URL**: <http://localhost:8123/play>
* **Username**: admin
* **Password**: clickhouse123

### Airflow Web Interface

* **URL**: <http://localhost:8080>
* **Username**: admin
* **Password**: admin

### MySQL (via command line)

```bash
# Connect to MySQL directly
docker exec -it mysql mysql -uroot -ppassword adventureworks
```

## Useful Management Commands

### Service Management

```bash
# Stop all services
docker-compose down

# Stop and remove volumes (WARNING: deletes all data)
docker-compose down -v

# Restart specific service
docker-compose restart logstash

# View service logs
docker-compose logs -f logstash

# Scale services (if supported)
docker-compose up -d --scale logstash=2
```

### Data Management

```bash
# Backup MySQL data
docker exec mysql mysqldump -uroot -ppassword adventureworks > backup.sql

# Backup ClickHouse data  
curl -u admin:clickhouse123 "http://localhost:8123/?query=SELECT * FROM bronze_layer.sales_orders FORMAT CSV" > bronze_backup.csv
```

### Resource Monitoring

```bash
# Monitor resource usage
docker stats

# View disk usage
docker system df

# Clean up unused resources
docker system prune
```

## Troubleshooting

### Services Won't Start

```bash
# Check for port conflicts
netstat -tulpn | grep -E ':(3306|8123|9000|5044|8080)'

# Check Docker daemon status
docker system info

# View detailed logs
docker-compose logs [service-name]
```

### LogStash Connection Issues

```bash
# Test MySQL connectivity from LogStash container
docker exec logstash nc -zv mysql 3306

# Check LogStash pipeline status
curl http://localhost:9600/_node/stats/pipelines
```

### ClickHouse Access Issues

```bash
# Test ClickHouse HTTP interface
curl -u admin:clickhouse123 http://localhost:8123/ping

# Check ClickHouse logs
docker logs clickhouse
```

## Next Steps

Your Docker Compose setup is now complete! Proceed to:

1. [**Service Verification**](/data-platform/index/setup/verification.md) - Test that everything is working correctly
2. [**LogStash Configuration**](/data-platform/index/development/logstash-configuration.md) - Customize data ingestion
3. [**DBT Getting Started**](/data-platform/index/development/dbt-getting-started.md) - Build your first data models

## Advantages of This Setup

* **🚀 Fast Deployment**: Entire stack running in under 5 minutes
* **🔗 Automatic Networking**: Services can communicate without manual configuration
* **📦 Consistent Environment**: Same setup works across different operating systems
* **🛠️ Easy Maintenance**: Single commands for start, stop, update operations
* **📊 Complete Stack**: All necessary services included and pre-configured

This Docker Compose setup provides a solid foundation for learning and development. When you're ready for more advanced configurations, consider the [individual containers setup](/data-platform/index/setup/individual-containers.md) for greater flexibility.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasuite.vn/data-platform/index/setup/docker-compose-setup.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
