Service Verification

After completing your DataSuite ETL setup, use this guide to verify that all services are running correctly and can communicate with each other.

Quick Verification Commands

Run these commands to quickly check service health:

# 1. Check MySQL
docker exec mysql mysql -uroot -ppassword -e "SHOW DATABASES;"

# 2. Check ClickHouse  
curl -u admin:clickhouse123 http://localhost:8123/ping

# 3. Check LogStash
curl http://localhost:9600/_node/stats

# 4. Check Airflow (if installed)
curl http://localhost:8080/health

Expected Results:

  • MySQL: Shows list of databases including adventureworks

  • ClickHouse: Returns Ok.

  • LogStash: Returns JSON stats

  • Airflow: Returns health status JSON

Detailed Service Verification

1. MySQL Database Verification

Check Database Status:

# Verify MySQL is running and responsive
docker exec mysql mysqladmin ping -h localhost -u root -ppassword

Expected Output: mysqld is alive

Verify AdventureWorks Data:

# Check that sample data is loaded
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT 
    table_name,
    table_rows 
FROM information_schema.tables 
WHERE table_schema = 'adventureworks' 
ORDER BY table_name;"

Expected Output:

+------------------------+------------+
| table_name             | table_rows |
+------------------------+------------+
| Sales_Customer         |      19820 |
| Sales_SalesOrderDetail |     121317 |
| Sales_SalesOrderHeader |      31465 |
| Sales_SalesTerritory   |         10 |
| Production_Product     |        504 |
+------------------------+------------+

Test Sample Query:

# Run sample analytical query
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT 
    YEAR(OrderDate) as order_year,
    COUNT(*) as order_count,
    ROUND(SUM(TotalDue), 2) as total_revenue
FROM Sales.SalesOrderHeader 
GROUP BY YEAR(OrderDate) 
ORDER BY order_year;"

2. ClickHouse Data Warehouse Verification

Check Service Health:

# Test ClickHouse HTTP interface
curl -u admin:clickhouse123 http://localhost:8123/ping

Expected Output: Ok.

Verify Database Creation:

# List databases
curl -u admin:clickhouse123 http://localhost:8123/ -d "SHOW DATABASES"

Expected Output:

INFORMATION_SCHEMA
bronze_layer
default
gold_layer
information_schema
system

Check Table Structure:

# Describe bronze layer table
curl -u admin:clickhouse123 http://localhost:8123/ -d "DESCRIBE bronze_layer.sales_orders FORMAT PrettyCompact"

Test Query Performance:

# Run performance test query
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    COUNT(*) as record_count,
    MIN(orderdate) as earliest_order,
    MAX(orderdate) as latest_order,
    SUM(totaldue) as total_revenue
FROM bronze_layer.sales_orders FORMAT PrettyCompact"

3. LogStash Data Pipeline Verification

Check Pipeline Status:

# Get LogStash node information
curl -s http://localhost:9600/_node/stats | jq '.pipelines'

Monitor Pipeline Activity:

# Watch LogStash logs for data ingestion
docker logs -f logstash --tail 50

Expected Log Patterns:

[INFO] Pipeline started {"pipeline.id"=>"main"}
[INFO] JDBC - Loaded mysql-connector-java
[INFO] JDBC - Connected to mysql successfully
[INFO] Pipeline execution completed

Verify Data Ingestion:

# Check if data is being ingested to ClickHouse
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    COUNT(*) as ingested_records,
    MAX(_ingested_at) as last_ingestion_time
FROM bronze_layer.sales_orders FORMAT PrettyCompact"

Test Incremental Loading:

# Update a record in MySQL to test incremental sync
docker exec mysql mysql -uroot -ppassword adventureworks -e "
UPDATE Sales.SalesOrderHeader 
SET ModifiedDate = NOW() 
WHERE SalesOrderId = 43659 
LIMIT 1;"

# Wait 60 seconds, then check if LogStash picked up the change
sleep 60
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT COUNT(*) as recent_updates 
FROM bronze_layer.sales_orders 
WHERE _ingested_at > NOW() - INTERVAL 2 MINUTE FORMAT PrettyCompact"

4. Network Connectivity Verification

Test Inter-Service Communication:

# Test MySQL → ClickHouse connectivity
docker exec mysql nc -zv clickhouse 8123
# Expected: Connection succeeded

# Test LogStash → MySQL connectivity  
docker exec logstash nc -zv mysql 3306
# Expected: Connection succeeded

# Test LogStash → ClickHouse connectivity
docker exec logstash nc -zv clickhouse 8123
# Expected: Connection succeeded

DNS Resolution Test:

# Verify service discovery within Docker network
docker exec logstash nslookup mysql
docker exec logstash nslookup clickhouse

5. Data Quality Verification

Row Count Reconciliation:

# Compare row counts between MySQL and ClickHouse
echo "MySQL Sales Orders:"
docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) FROM Sales.SalesOrderHeader;"

echo "ClickHouse Bronze Layer:"
curl -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM bronze_layer.sales_orders FORMAT TabSeparated"

Data Integrity Checks:

# Check for duplicate records
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    salesorderid,
    COUNT(*) as duplicate_count
FROM bronze_layer.sales_orders 
GROUP BY salesorderid 
HAVING COUNT(*) > 1 
ORDER BY duplicate_count DESC 
LIMIT 10 FORMAT PrettyCompact"

Sample Data Verification:

# Compare sample records between source and destination
echo "MySQL Sample (first 3 records):"
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT SalesOrderId, CustomerId, OrderDate, TotalDue 
FROM Sales.SalesOrderHeader 
ORDER BY SalesOrderId 
LIMIT 3;"

echo "ClickHouse Sample (first 3 records):"
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT salesorderid, customerid, orderdate, totaldue 
FROM bronze_layer.sales_orders 
ORDER BY salesorderid 
LIMIT 3 FORMAT PrettyCompact"

Performance Verification

Query Response Time Testing

ClickHouse Performance Test:

# Test analytical query performance
time curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    toYear(orderdate) as year,
    COUNT(*) as orders,
    ROUND(SUM(totaldue), 2) as revenue,
    ROUND(AVG(totaldue), 2) as avg_order_value
FROM bronze_layer.sales_orders 
GROUP BY toYear(orderdate) 
ORDER BY year FORMAT PrettyCompact"

Expected Response Time: < 1 second for typical analytical queries

Resource Usage Monitoring

Container Resource Usage:

# Monitor CPU and memory usage
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"

Expected Resource Usage:

  • MySQL: 5-10% CPU, 200-500MB memory

  • ClickHouse: 10-20% CPU, 1-2GB memory

  • LogStash: 5-15% CPU, 512MB-1GB memory

Web Interface Verification

ClickHouse Play Interface

  1. Navigate to http://localhost:8123/play

  2. Login with username: admin, password: clickhouse123

  3. Run test query:

SELECT 
    'ClickHouse is working!' as status,
    version() as version,
    now() as current_time

Airflow Web Interface (if installed)

  1. Navigate to http://localhost:8080

  2. Login with username: admin, password: admin

  3. Verify DAGs are visible and scheduler is running

Automated Verification Script

Create scripts/verify-installation.sh for automated checks:

#!/bin/bash

echo "🔍 DataSuite ETL Installation Verification"
echo "=========================================="

ERRORS=0

# Function to check service
check_service() {
    local service=$1
    local check_command=$2
    local expected_pattern=$3
    
    echo -n "Checking $service... "
    if eval "$check_command" | grep -q "$expected_pattern"; then
        echo "✅ PASS"
    else
        echo "❌ FAIL"
        ((ERRORS++))
    fi
}

# MySQL checks
check_service "MySQL Connection" \
    "docker exec mysql mysqladmin ping -h localhost -u root -ppassword 2>/dev/null" \
    "mysqld is alive"

check_service "AdventureWorks Database" \
    "docker exec mysql mysql -uroot -ppassword -e 'USE adventureworks; SELECT 1;' 2>/dev/null" \
    "1"

# ClickHouse checks  
check_service "ClickHouse HTTP Interface" \
    "curl -s -u admin:clickhouse123 http://localhost:8123/ping" \
    "Ok"

check_service "Bronze Layer Database" \
    "curl -s -u admin:clickhouse123 http://localhost:8123/ -d 'EXISTS bronze_layer.sales_orders'" \
    "1"

# LogStash checks
check_service "LogStash API" \
    "curl -s http://localhost:9600/_node/stats" \
    "pipeline"

# Network connectivity checks
check_service "LogStash → MySQL" \
    "docker exec logstash nc -zv mysql 3306 2>&1" \
    "succeeded"

check_service "LogStash → ClickHouse" \
    "docker exec logstash nc -zv clickhouse 8123 2>&1" \
    "succeeded"

# Data pipeline checks
echo -n "Checking data ingestion... "
MYSQL_COUNT=$(docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) FROM Sales.SalesOrderHeader;" -s -N 2>/dev/null)
CH_COUNT=$(curl -s -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM bronze_layer.sales_orders FORMAT TabSeparated")

if [ "$MYSQL_COUNT" -gt 0 ] && [ "$CH_COUNT" -gt 0 ]; then
    echo "✅ PASS (MySQL: $MYSQL_COUNT, ClickHouse: $CH_COUNT)"
else
    echo "❌ FAIL (MySQL: $MYSQL_COUNT, ClickHouse: $CH_COUNT)"  
    ((ERRORS++))
fi

echo
echo "=========================================="
if [ $ERRORS -eq 0 ]; then
    echo "🎉 All verification checks passed!"
    echo "Your DataSuite ETL installation is working correctly."
else
    echo "⚠️ $ERRORS verification check(s) failed."
    echo "Please review the output above and check your installation."
fi
echo "=========================================="

exit $ERRORS
# Make script executable and run
chmod +x scripts/verify-installation.sh
./scripts/verify-installation.sh

Troubleshooting Failed Verifications

MySQL Connection Issues

# Check MySQL container logs
docker logs mysql --tail 50

# Verify MySQL is listening on correct port
docker exec mysql netstat -tuln | grep 3306

# Test connection with different approach
docker exec mysql mysql -uroot -ppassword -e "SELECT 'MySQL is working';"

ClickHouse Access Issues

# Check ClickHouse logs
docker logs clickhouse --tail 50

# Test without authentication
curl http://localhost:8123/ping

# Check ClickHouse configuration
docker exec clickhouse cat /etc/clickhouse-server/users.xml

LogStash Pipeline Issues

# Check LogStash pipeline configuration
docker exec logstash cat /usr/share/logstash/pipeline/logstash.conf

# View LogStash pipeline stats
curl -s http://localhost:9600/_node/stats/pipelines | jq

# Test JDBC connectivity manually
docker exec logstash java -cp /usr/share/logstash/drivers/mysql-connector-java.jar \
  com.mysql.cj.jdbc.Driver "jdbc:mysql://mysql:3306/adventureworks"

Next Steps

Once all verifications pass:

  1. LogStash Configuration - Customize data ingestion pipelines

  2. DBT Getting Started - Build your first data transformations

  3. Troubleshooting Guide - Reference for resolving issues

Verification Checklist

Core Services

Data Pipeline

Performance

Access

Your DataSuite ETL system is now verified and ready for development!

Last updated

Was this helpful?