Advanced Setup (Individual Containers)

This setup method gives you complete control over each service configuration. Choose this approach when you need custom networking, specific versions, or advanced configuration options.

Why Choose Individual Containers?

✅ Advantages:

  • Complete Control: Configure each service exactly as needed

  • Custom Networking: Set up advanced network topologies

  • Resource Management: Allocate specific CPU/memory to each service

  • Version Control: Use different versions of each component

  • Production-Like: More similar to production deployment patterns

❌ Considerations:

  • More complex setup and management

  • Manual network configuration required

  • Requires deeper understanding of each component

  • More time-consuming for initial setup

Step 1: Create Docker Network

Create a custom network for service communication:

Expected Output:

Step 2: Set Up MySQL Database

Install and configure MySQL with AdventureWorks data:

Advanced MySQL Configuration (optional): Create mysql-config/custom.cnf:

Load Sample Data:

Step 3: Install ClickHouse Data Warehouse

Set up ClickHouse with custom configuration:

Advanced ClickHouse Configuration (optional): Create clickhouse-config/custom.xml:

Initialize ClickHouse Databases:

Step 4: Install LogStash with Custom Pipeline

Set up LogStash with MySQL connector and custom configuration:

Create Advanced LogStash Configuration: Create logstash-config/pipelines.yml:

Create Pipeline-Specific Configurations: Create logstash-config/sales-orders.conf:

Create SQL Query Files: Create sql-queries/sales_orders.sql:

Run LogStash Container:

Step 5: Install Apache Airflow (Optional)

Set up Airflow for workflow orchestration:

Step 6: Configure Service Health Monitoring

Create monitoring scripts for service health:

Create scripts/health-check.sh:

Step 7: Advanced Network Configuration

Custom Bridge Network

Service Discovery Configuration

Create /etc/hosts entries for easier access:

Management Scripts

Create scripts/manage-services.sh:

Resource Management

CPU and Memory Limits

Storage Optimization

Backup and Recovery

Create scripts/backup.sh:

Next Steps

Your individual containers setup provides maximum flexibility. Proceed to:

  1. Service Verification - Confirm everything works correctly

  2. LogStash Configuration - Customize your data pipelines

  3. DBT Getting Started - Build dimensional models

Benefits of Individual Container Setup

  • 🎛️ Fine-Grained Control: Configure each service precisely for your needs

  • 🏗️ Production Similarity: More closely matches production deployment patterns

  • 📈 Scalability: Easy to scale individual services based on load

  • 🔧 Customization: Advanced configuration options for performance tuning

  • 🚀 Deployment Flexibility: Can be adapted for Kubernetes or other orchestrators

This setup method provides the foundation for understanding how to deploy and manage DataSuite ETL in production environments.

Last updated