# Quick Start (Docker Compose)

This is the recommended setup method for getting started quickly with DataSuite ETL. Docker Compose orchestrates all services with a single command.

## Why Choose Docker Compose?

**✅ Advantages:**

* **Fast Setup**: Single command to start entire stack
* **Simplified Networking**: Automatic service discovery and communication
* **Easy Management**: Start, stop, and restart all services together
* **Consistent Environment**: Same configuration across different machines
* **Beginner Friendly**: Minimal configuration required

**❌ Limitations:**

* Less control over individual service configuration
* All services use default settings
* Limited customization options

## Step 1: Download MySQL JDBC Driver

First, create the directory structure and download the required MySQL connector:

```bash
# Create directory for LogStash drivers
mkdir -p logstash-drivers

# Download MySQL JDBC driver using Docker container
docker run --rm -v $(pwd)/logstash-drivers:/drivers alpine:latest \
  sh -c "apk add --no-cache wget && \
         wget -O /drivers/mysql-connector-java.jar \
         https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.9-rc/mysql-connector-java-8.0.9-rc.jar"
```

**Expected Output:**

```bash
Connecting to repo1.maven.org...
HTTP request sent, awaiting response... 200 OK
Length: 2295712 (2.2M) [application/java-archive]
mysql-connector-java.jar saved [2295712/2295712]
```

## Step 2: Create Docker Compose Configuration

Create `docker-compose.yml` in your project directory:

```yaml
version: '3.8'

services:
  mysql:
    image: mysql:8.0
    container_name: mysql
    environment:
      MYSQL_ROOT_PASSWORD: password
      MYSQL_DATABASE: adventureworks  
    ports:
      - "3306:3306"
    volumes:
      - mysql-data:/var/lib/mysql
    networks:
      - datasuite-network

  clickhouse:
    image: clickhouse/clickhouse-server:latest
    container_name: clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
    environment:
      CLICKHOUSE_USER: admin
      CLICKHOUSE_PASSWORD: clickhouse123
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    volumes:
      - clickhouse-data:/var/lib/clickhouse
    networks:
      - datasuite-network

  logstash:
    image: docker.elastic.co/logstash/logstash:8.11.0
    container_name: logstash
    ports:
      - "5044:5044"
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
      - ./logstash-drivers:/usr/share/logstash/drivers
    depends_on:
      - mysql
      - clickhouse
    networks: 
      - datasuite-network

  airflow:
    image: apache/airflow:2.7.0
    container_name: airflow
    ports:
      - "8080:8080"
    volumes:
      - ./dags:/opt/airflow/dags
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
    depends_on:
      - clickhouse
    networks:
      - datasuite-network

volumes:
  mysql-data:
  clickhouse-data:

networks:
  datasuite-network:
    driver: bridge
```

## Step 3: Create LogStash Configuration

Create `logstash.conf` in your project directory:

```bash
input {
  jdbc {
    jdbc_driver_library => "/usr/share/logstash/drivers/mysql-connector-java.jar"
    jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://mysql:3306/adventureworks"
    jdbc_user => "root"
    jdbc_password => "password"
    statement => "SELECT * FROM Sales.SalesOrderHeader WHERE ModifiedDate > :sql_last_value ORDER BY ModifiedDate"
    use_column_value => true
    tracking_column => "ModifiedDate"
    tracking_column_type => "timestamp"
    last_run_metadata_path => "/usr/share/logstash/.logstash_jdbc_last_run"
    clean_run => false
    schedule => "*/30 * * * * *"
  }
}

filter {
  # Convert to proper JSON format for ClickHouse
  mutate {
    remove_field => ["@version", "@timestamp"]
  }
}

output {
  http {
    url => "http://clickhouse:8123/"
    http_method => "post"
    format => "message"
    message => "INSERT INTO bronze_layer.sales_orders FORMAT JSONEachRow %{message}"
    headers => {
      "Authorization" => "Basic YWRtaW46Y2xpY2tob3VzZTEyMw=="  # admin:clickhouse123 base64
      "Content-Type" => "application/json"
    }
  }
  
  # Debug output to console
  stdout { 
    codec => rubydebug 
  }
}
```

## Step 4: Start All Services

Launch the entire stack with a single command:

```bash
# Start all services in detached mode
docker-compose up -d
```

**Expected Output:**

```bash
Creating network "datasuite-etl_datasuite-network" with driver "bridge"
Creating volume "datasuite-etl_mysql-data" with default driver
Creating volume "datasuite-etl_clickhouse-data" with default driver
Creating mysql ... done
Creating clickhouse ... done
Creating logstash ... done
Creating airflow ... done
```

## Step 5: Load Sample Data

Once MySQL is running, load the AdventureWorks sample data:

```bash
# Wait for MySQL to be ready (may take 30-60 seconds)
docker-compose exec mysql mysqladmin ping -h localhost -u root -ppassword

# Load Adventure Works 2019 data
docker exec -i mysql mysql -uroot -ppassword adventureworks < docs/data/AdventureWorks2019.sql

# Verify data was loaded successfully
docker exec mysql mysql -uroot -ppassword adventureworks -e "SHOW TABLES;"
docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) as table_count FROM information_schema.tables WHERE table_schema='adventureworks';"
```

**Expected Output:**

```sql
+-------------------------+
| Tables_in_adventureworks|
+-------------------------+
| Sales_Customer          |
| Sales_SalesOrderDetail  |
| Sales_SalesOrderHeader  |
| Sales_SalesTerritory    |
| Production_Product      |
+-------------------------+

+--------------+
| table_count  |
+--------------+
|           5  |
+--------------+
```

## Step 6: Prepare ClickHouse

Create the required databases in ClickHouse:

```bash
# Create bronze layer database
curl -u admin:clickhouse123 http://localhost:8123/ -d "CREATE DATABASE IF NOT EXISTS bronze_layer"

# Create gold layer database  
curl -u admin:clickhouse123 http://localhost:8123/ -d "CREATE DATABASE IF NOT EXISTS gold_layer"

# Create sales_orders table in bronze layer
curl -u admin:clickhouse123 http://localhost:8123/ -d "
CREATE TABLE IF NOT EXISTS bronze_layer.sales_orders (
    salesorderid Int32,
    customerid Int32,
    territoryid Nullable(Int32),
    orderdate DateTime,
    duedate DateTime,
    shipdate Nullable(DateTime),
    status Int8,
    subtotal Decimal64(4),
    taxamt Decimal64(4),
    freight Decimal64(4),
    totaldue Decimal64(4),
    modifieddate DateTime
) ENGINE = MergeTree()
ORDER BY (orderdate, salesorderid)
"
```

## Step 7: Monitor Service Status

Check that all services are running correctly:

```bash
# View service status
docker-compose ps

# Expected output should show all services as "Up"
```

**Expected Output:**

```bash
    Name                   Command               State                    Ports                  
------------------------------------------------------------------------------------------------
airflow        /usr/bin/dumb-init -- /ent ...   Up      0.0.0.0:8080->8080/tcp,:::8080->8080/tcp
clickhouse     /entrypoint.sh                   Up      0.0.0.0:8123->8123/tcp,:::8123->8123/tcp, 
                                                        0.0.0.0:9000->9000/tcp,:::9000->9000/tcp
logstash       /usr/local/bin/docker-entr ...   Up      0.0.0.0:5044->5044/tcp,:::5044->5044/tcp
mysql          docker-entrypoint.sh mysqld      Up      0.0.0.0:3306->3306/tcp,:::3306->3306/tcp, 33060/tcp
```

## Step 8: Access Service UIs

Once everything is running, you can access the web interfaces:

### ClickHouse Web Interface

* **URL**: <http://localhost:8123/play>
* **Username**: admin
* **Password**: clickhouse123

### Airflow Web Interface

* **URL**: <http://localhost:8080>
* **Username**: admin
* **Password**: admin

### MySQL (via command line)

```bash
# Connect to MySQL directly
docker exec -it mysql mysql -uroot -ppassword adventureworks
```

## Useful Management Commands

### Service Management

```bash
# Stop all services
docker-compose down

# Stop and remove volumes (WARNING: deletes all data)
docker-compose down -v

# Restart specific service
docker-compose restart logstash

# View service logs
docker-compose logs -f logstash

# Scale services (if supported)
docker-compose up -d --scale logstash=2
```

### Data Management

```bash
# Backup MySQL data
docker exec mysql mysqldump -uroot -ppassword adventureworks > backup.sql

# Backup ClickHouse data  
curl -u admin:clickhouse123 "http://localhost:8123/?query=SELECT * FROM bronze_layer.sales_orders FORMAT CSV" > bronze_backup.csv
```

### Resource Monitoring

```bash
# Monitor resource usage
docker stats

# View disk usage
docker system df

# Clean up unused resources
docker system prune
```

## Troubleshooting

### Services Won't Start

```bash
# Check for port conflicts
netstat -tulpn | grep -E ':(3306|8123|9000|5044|8080)'

# Check Docker daemon status
docker system info

# View detailed logs
docker-compose logs [service-name]
```

### LogStash Connection Issues

```bash
# Test MySQL connectivity from LogStash container
docker exec logstash nc -zv mysql 3306

# Check LogStash pipeline status
curl http://localhost:9600/_node/stats/pipelines
```

### ClickHouse Access Issues

```bash
# Test ClickHouse HTTP interface
curl -u admin:clickhouse123 http://localhost:8123/ping

# Check ClickHouse logs
docker logs clickhouse
```

## Next Steps

Your Docker Compose setup is now complete! Proceed to:

1. [**Service Verification**](https://docs.datasuite.vn/data-platform/index/setup/verification) - Test that everything is working correctly
2. [**LogStash Configuration**](https://docs.datasuite.vn/data-platform/index/development/logstash-configuration) - Customize data ingestion
3. [**DBT Getting Started**](https://docs.datasuite.vn/data-platform/index/development/dbt-getting-started) - Build your first data models

## Advantages of This Setup

* **🚀 Fast Deployment**: Entire stack running in under 5 minutes
* **🔗 Automatic Networking**: Services can communicate without manual configuration
* **📦 Consistent Environment**: Same setup works across different operating systems
* **🛠️ Easy Maintenance**: Single commands for start, stop, update operations
* **📊 Complete Stack**: All necessary services included and pre-configured

This Docker Compose setup provides a solid foundation for learning and development. When you're ready for more advanced configurations, consider the [individual containers setup](https://docs.datasuite.vn/data-platform/index/setup/individual-containers) for greater flexibility.
