Quick Start (Docker Compose)
This is the recommended setup method for getting started quickly with DataSuite ETL. Docker Compose orchestrates all services with a single command.
Why Choose Docker Compose?
✅ Advantages:
Fast Setup: Single command to start entire stack
Simplified Networking: Automatic service discovery and communication
Easy Management: Start, stop, and restart all services together
Consistent Environment: Same configuration across different machines
Beginner Friendly: Minimal configuration required
❌ Limitations:
Less control over individual service configuration
All services use default settings
Limited customization options
Step 1: Download MySQL JDBC Driver
First, create the directory structure and download the required MySQL connector:
# Create directory for LogStash drivers
mkdir -p logstash-drivers
# Download MySQL JDBC driver using Docker container
docker run --rm -v $(pwd)/logstash-drivers:/drivers alpine:latest \
sh -c "apk add --no-cache wget && \
wget -O /drivers/mysql-connector-java.jar \
https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.9-rc/mysql-connector-java-8.0.9-rc.jar"
Expected Output:
Connecting to repo1.maven.org...
HTTP request sent, awaiting response... 200 OK
Length: 2295712 (2.2M) [application/java-archive]
mysql-connector-java.jar saved [2295712/2295712]
Step 2: Create Docker Compose Configuration
Create docker-compose.yml
in your project directory:
version: '3.8'
services:
mysql:
image: mysql:8.0
container_name: mysql
environment:
MYSQL_ROOT_PASSWORD: password
MYSQL_DATABASE: adventureworks
ports:
- "3306:3306"
volumes:
- mysql-data:/var/lib/mysql
networks:
- datasuite-network
clickhouse:
image: clickhouse/clickhouse-server:latest
container_name: clickhouse
ports:
- "8123:8123"
- "9000:9000"
environment:
CLICKHOUSE_USER: admin
CLICKHOUSE_PASSWORD: clickhouse123
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
volumes:
- clickhouse-data:/var/lib/clickhouse
networks:
- datasuite-network
logstash:
image: docker.elastic.co/logstash/logstash:8.11.0
container_name: logstash
ports:
- "5044:5044"
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
- ./logstash-drivers:/usr/share/logstash/drivers
depends_on:
- mysql
- clickhouse
networks:
- datasuite-network
airflow:
image: apache/airflow:2.7.0
container_name: airflow
ports:
- "8080:8080"
volumes:
- ./dags:/opt/airflow/dags
environment:
AIRFLOW__CORE__EXECUTOR: LocalExecutor
depends_on:
- clickhouse
networks:
- datasuite-network
volumes:
mysql-data:
clickhouse-data:
networks:
datasuite-network:
driver: bridge
Step 3: Create LogStash Configuration
Create logstash.conf
in your project directory:
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/drivers/mysql-connector-java.jar"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://mysql:3306/adventureworks"
jdbc_user => "root"
jdbc_password => "password"
statement => "SELECT * FROM Sales.SalesOrderHeader WHERE ModifiedDate > :sql_last_value ORDER BY ModifiedDate"
use_column_value => true
tracking_column => "ModifiedDate"
tracking_column_type => "timestamp"
last_run_metadata_path => "/usr/share/logstash/.logstash_jdbc_last_run"
clean_run => false
schedule => "*/30 * * * * *"
}
}
filter {
# Convert to proper JSON format for ClickHouse
mutate {
remove_field => ["@version", "@timestamp"]
}
}
output {
http {
url => "http://clickhouse:8123/"
http_method => "post"
format => "message"
message => "INSERT INTO bronze_layer.sales_orders FORMAT JSONEachRow %{message}"
headers => {
"Authorization" => "Basic YWRtaW46Y2xpY2tob3VzZTEyMw==" # admin:clickhouse123 base64
"Content-Type" => "application/json"
}
}
# Debug output to console
stdout {
codec => rubydebug
}
}
Step 4: Start All Services
Launch the entire stack with a single command:
# Start all services in detached mode
docker-compose up -d
Expected Output:
Creating network "datasuite-etl_datasuite-network" with driver "bridge"
Creating volume "datasuite-etl_mysql-data" with default driver
Creating volume "datasuite-etl_clickhouse-data" with default driver
Creating mysql ... done
Creating clickhouse ... done
Creating logstash ... done
Creating airflow ... done
Step 5: Load Sample Data
Once MySQL is running, load the AdventureWorks sample data:
# Wait for MySQL to be ready (may take 30-60 seconds)
docker-compose exec mysql mysqladmin ping -h localhost -u root -ppassword
# Load Adventure Works 2019 data
docker exec -i mysql mysql -uroot -ppassword adventureworks < docs/data/AdventureWorks2019.sql
# Verify data was loaded successfully
docker exec mysql mysql -uroot -ppassword adventureworks -e "SHOW TABLES;"
docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) as table_count FROM information_schema.tables WHERE table_schema='adventureworks';"
Expected Output:
+-------------------------+
| Tables_in_adventureworks|
+-------------------------+
| Sales_Customer |
| Sales_SalesOrderDetail |
| Sales_SalesOrderHeader |
| Sales_SalesTerritory |
| Production_Product |
+-------------------------+
+--------------+
| table_count |
+--------------+
| 5 |
+--------------+
Step 6: Prepare ClickHouse
Create the required databases in ClickHouse:
# Create bronze layer database
curl -u admin:clickhouse123 http://localhost:8123/ -d "CREATE DATABASE IF NOT EXISTS bronze_layer"
# Create gold layer database
curl -u admin:clickhouse123 http://localhost:8123/ -d "CREATE DATABASE IF NOT EXISTS gold_layer"
# Create sales_orders table in bronze layer
curl -u admin:clickhouse123 http://localhost:8123/ -d "
CREATE TABLE IF NOT EXISTS bronze_layer.sales_orders (
salesorderid Int32,
customerid Int32,
territoryid Nullable(Int32),
orderdate DateTime,
duedate DateTime,
shipdate Nullable(DateTime),
status Int8,
subtotal Decimal64(4),
taxamt Decimal64(4),
freight Decimal64(4),
totaldue Decimal64(4),
modifieddate DateTime
) ENGINE = MergeTree()
ORDER BY (orderdate, salesorderid)
"
Step 7: Monitor Service Status
Check that all services are running correctly:
# View service status
docker-compose ps
# Expected output should show all services as "Up"
Expected Output:
Name Command State Ports
------------------------------------------------------------------------------------------------
airflow /usr/bin/dumb-init -- /ent ... Up 0.0.0.0:8080->8080/tcp,:::8080->8080/tcp
clickhouse /entrypoint.sh Up 0.0.0.0:8123->8123/tcp,:::8123->8123/tcp,
0.0.0.0:9000->9000/tcp,:::9000->9000/tcp
logstash /usr/local/bin/docker-entr ... Up 0.0.0.0:5044->5044/tcp,:::5044->5044/tcp
mysql docker-entrypoint.sh mysqld Up 0.0.0.0:3306->3306/tcp,:::3306->3306/tcp, 33060/tcp
Step 8: Access Service UIs
Once everything is running, you can access the web interfaces:
ClickHouse Web Interface
URL: http://localhost:8123/play
Username: admin
Password: clickhouse123
Airflow Web Interface
URL: http://localhost:8080
Username: admin
Password: admin
MySQL (via command line)
# Connect to MySQL directly
docker exec -it mysql mysql -uroot -ppassword adventureworks
Useful Management Commands
Service Management
# Stop all services
docker-compose down
# Stop and remove volumes (WARNING: deletes all data)
docker-compose down -v
# Restart specific service
docker-compose restart logstash
# View service logs
docker-compose logs -f logstash
# Scale services (if supported)
docker-compose up -d --scale logstash=2
Data Management
# Backup MySQL data
docker exec mysql mysqldump -uroot -ppassword adventureworks > backup.sql
# Backup ClickHouse data
curl -u admin:clickhouse123 "http://localhost:8123/?query=SELECT * FROM bronze_layer.sales_orders FORMAT CSV" > bronze_backup.csv
Resource Monitoring
# Monitor resource usage
docker stats
# View disk usage
docker system df
# Clean up unused resources
docker system prune
Troubleshooting
Services Won't Start
# Check for port conflicts
netstat -tulpn | grep -E ':(3306|8123|9000|5044|8080)'
# Check Docker daemon status
docker system info
# View detailed logs
docker-compose logs [service-name]
LogStash Connection Issues
# Test MySQL connectivity from LogStash container
docker exec logstash nc -zv mysql 3306
# Check LogStash pipeline status
curl http://localhost:9600/_node/stats/pipelines
ClickHouse Access Issues
# Test ClickHouse HTTP interface
curl -u admin:clickhouse123 http://localhost:8123/ping
# Check ClickHouse logs
docker logs clickhouse
Next Steps
Your Docker Compose setup is now complete! Proceed to:
Service Verification - Test that everything is working correctly
LogStash Configuration - Customize data ingestion
DBT Getting Started - Build your first data models
Advantages of This Setup
🚀 Fast Deployment: Entire stack running in under 5 minutes
🔗 Automatic Networking: Services can communicate without manual configuration
📦 Consistent Environment: Same setup works across different operating systems
🛠️ Easy Maintenance: Single commands for start, stop, update operations
📊 Complete Stack: All necessary services included and pre-configured
This Docker Compose setup provides a solid foundation for learning and development. When you're ready for more advanced configurations, consider the individual containers setup for greater flexibility.
Last updated
Was this helpful?