Service Verification
After completing your DataSuite ETL setup, use this guide to verify that all services are running correctly and can communicate with each other.
Quick Verification Commands
Run these commands to quickly check service health:
# 1. Check MySQL
docker exec mysql mysql -uroot -ppassword -e "SHOW DATABASES;"
# 2. Check ClickHouse
curl -u admin:clickhouse123 http://localhost:8123/ping
# 3. Check LogStash
curl http://localhost:9600/_node/stats
# 4. Check Airflow (if installed)
curl http://localhost:8080/health
Expected Results:
MySQL: Shows list of databases including
adventureworks
ClickHouse: Returns
Ok.
LogStash: Returns JSON stats
Airflow: Returns health status JSON
Detailed Service Verification
1. MySQL Database Verification
Check Database Status:
# Verify MySQL is running and responsive
docker exec mysql mysqladmin ping -h localhost -u root -ppassword
Expected Output: mysqld is alive
Verify AdventureWorks Data:
# Check that sample data is loaded
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT
table_name,
table_rows
FROM information_schema.tables
WHERE table_schema = 'adventureworks'
ORDER BY table_name;"
Expected Output:
+------------------------+------------+
| table_name | table_rows |
+------------------------+------------+
| Sales_Customer | 19820 |
| Sales_SalesOrderDetail | 121317 |
| Sales_SalesOrderHeader | 31465 |
| Sales_SalesTerritory | 10 |
| Production_Product | 504 |
+------------------------+------------+
Test Sample Query:
# Run sample analytical query
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT
YEAR(OrderDate) as order_year,
COUNT(*) as order_count,
ROUND(SUM(TotalDue), 2) as total_revenue
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate)
ORDER BY order_year;"
2. ClickHouse Data Warehouse Verification
Check Service Health:
# Test ClickHouse HTTP interface
curl -u admin:clickhouse123 http://localhost:8123/ping
Expected Output: Ok.
Verify Database Creation:
# List databases
curl -u admin:clickhouse123 http://localhost:8123/ -d "SHOW DATABASES"
Expected Output:
INFORMATION_SCHEMA
bronze_layer
default
gold_layer
information_schema
system
Check Table Structure:
# Describe bronze layer table
curl -u admin:clickhouse123 http://localhost:8123/ -d "DESCRIBE bronze_layer.sales_orders FORMAT PrettyCompact"
Test Query Performance:
# Run performance test query
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT
COUNT(*) as record_count,
MIN(orderdate) as earliest_order,
MAX(orderdate) as latest_order,
SUM(totaldue) as total_revenue
FROM bronze_layer.sales_orders FORMAT PrettyCompact"
3. LogStash Data Pipeline Verification
Check Pipeline Status:
# Get LogStash node information
curl -s http://localhost:9600/_node/stats | jq '.pipelines'
Monitor Pipeline Activity:
# Watch LogStash logs for data ingestion
docker logs -f logstash --tail 50
Expected Log Patterns:
[INFO] Pipeline started {"pipeline.id"=>"main"}
[INFO] JDBC - Loaded mysql-connector-java
[INFO] JDBC - Connected to mysql successfully
[INFO] Pipeline execution completed
Verify Data Ingestion:
# Check if data is being ingested to ClickHouse
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT
COUNT(*) as ingested_records,
MAX(_ingested_at) as last_ingestion_time
FROM bronze_layer.sales_orders FORMAT PrettyCompact"
Test Incremental Loading:
# Update a record in MySQL to test incremental sync
docker exec mysql mysql -uroot -ppassword adventureworks -e "
UPDATE Sales.SalesOrderHeader
SET ModifiedDate = NOW()
WHERE SalesOrderId = 43659
LIMIT 1;"
# Wait 60 seconds, then check if LogStash picked up the change
sleep 60
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT COUNT(*) as recent_updates
FROM bronze_layer.sales_orders
WHERE _ingested_at > NOW() - INTERVAL 2 MINUTE FORMAT PrettyCompact"
4. Network Connectivity Verification
Test Inter-Service Communication:
# Test MySQL → ClickHouse connectivity
docker exec mysql nc -zv clickhouse 8123
# Expected: Connection succeeded
# Test LogStash → MySQL connectivity
docker exec logstash nc -zv mysql 3306
# Expected: Connection succeeded
# Test LogStash → ClickHouse connectivity
docker exec logstash nc -zv clickhouse 8123
# Expected: Connection succeeded
DNS Resolution Test:
# Verify service discovery within Docker network
docker exec logstash nslookup mysql
docker exec logstash nslookup clickhouse
5. Data Quality Verification
Row Count Reconciliation:
# Compare row counts between MySQL and ClickHouse
echo "MySQL Sales Orders:"
docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) FROM Sales.SalesOrderHeader;"
echo "ClickHouse Bronze Layer:"
curl -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM bronze_layer.sales_orders FORMAT TabSeparated"
Data Integrity Checks:
# Check for duplicate records
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT
salesorderid,
COUNT(*) as duplicate_count
FROM bronze_layer.sales_orders
GROUP BY salesorderid
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC
LIMIT 10 FORMAT PrettyCompact"
Sample Data Verification:
# Compare sample records between source and destination
echo "MySQL Sample (first 3 records):"
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT SalesOrderId, CustomerId, OrderDate, TotalDue
FROM Sales.SalesOrderHeader
ORDER BY SalesOrderId
LIMIT 3;"
echo "ClickHouse Sample (first 3 records):"
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT salesorderid, customerid, orderdate, totaldue
FROM bronze_layer.sales_orders
ORDER BY salesorderid
LIMIT 3 FORMAT PrettyCompact"
Performance Verification
Query Response Time Testing
ClickHouse Performance Test:
# Test analytical query performance
time curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT
toYear(orderdate) as year,
COUNT(*) as orders,
ROUND(SUM(totaldue), 2) as revenue,
ROUND(AVG(totaldue), 2) as avg_order_value
FROM bronze_layer.sales_orders
GROUP BY toYear(orderdate)
ORDER BY year FORMAT PrettyCompact"
Expected Response Time: < 1 second for typical analytical queries
Resource Usage Monitoring
Container Resource Usage:
# Monitor CPU and memory usage
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
Expected Resource Usage:
MySQL: 5-10% CPU, 200-500MB memory
ClickHouse: 10-20% CPU, 1-2GB memory
LogStash: 5-15% CPU, 512MB-1GB memory
Web Interface Verification
ClickHouse Play Interface
Navigate to http://localhost:8123/play
Login with username:
admin
, password:clickhouse123
Run test query:
SELECT
'ClickHouse is working!' as status,
version() as version,
now() as current_time
Airflow Web Interface (if installed)
Navigate to http://localhost:8080
Login with username:
admin
, password:admin
Verify DAGs are visible and scheduler is running
Automated Verification Script
Create scripts/verify-installation.sh
for automated checks:
#!/bin/bash
echo "🔍 DataSuite ETL Installation Verification"
echo "=========================================="
ERRORS=0
# Function to check service
check_service() {
local service=$1
local check_command=$2
local expected_pattern=$3
echo -n "Checking $service... "
if eval "$check_command" | grep -q "$expected_pattern"; then
echo "✅ PASS"
else
echo "❌ FAIL"
((ERRORS++))
fi
}
# MySQL checks
check_service "MySQL Connection" \
"docker exec mysql mysqladmin ping -h localhost -u root -ppassword 2>/dev/null" \
"mysqld is alive"
check_service "AdventureWorks Database" \
"docker exec mysql mysql -uroot -ppassword -e 'USE adventureworks; SELECT 1;' 2>/dev/null" \
"1"
# ClickHouse checks
check_service "ClickHouse HTTP Interface" \
"curl -s -u admin:clickhouse123 http://localhost:8123/ping" \
"Ok"
check_service "Bronze Layer Database" \
"curl -s -u admin:clickhouse123 http://localhost:8123/ -d 'EXISTS bronze_layer.sales_orders'" \
"1"
# LogStash checks
check_service "LogStash API" \
"curl -s http://localhost:9600/_node/stats" \
"pipeline"
# Network connectivity checks
check_service "LogStash → MySQL" \
"docker exec logstash nc -zv mysql 3306 2>&1" \
"succeeded"
check_service "LogStash → ClickHouse" \
"docker exec logstash nc -zv clickhouse 8123 2>&1" \
"succeeded"
# Data pipeline checks
echo -n "Checking data ingestion... "
MYSQL_COUNT=$(docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) FROM Sales.SalesOrderHeader;" -s -N 2>/dev/null)
CH_COUNT=$(curl -s -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM bronze_layer.sales_orders FORMAT TabSeparated")
if [ "$MYSQL_COUNT" -gt 0 ] && [ "$CH_COUNT" -gt 0 ]; then
echo "✅ PASS (MySQL: $MYSQL_COUNT, ClickHouse: $CH_COUNT)"
else
echo "❌ FAIL (MySQL: $MYSQL_COUNT, ClickHouse: $CH_COUNT)"
((ERRORS++))
fi
echo
echo "=========================================="
if [ $ERRORS -eq 0 ]; then
echo "🎉 All verification checks passed!"
echo "Your DataSuite ETL installation is working correctly."
else
echo "⚠️ $ERRORS verification check(s) failed."
echo "Please review the output above and check your installation."
fi
echo "=========================================="
exit $ERRORS
# Make script executable and run
chmod +x scripts/verify-installation.sh
./scripts/verify-installation.sh
Troubleshooting Failed Verifications
MySQL Connection Issues
# Check MySQL container logs
docker logs mysql --tail 50
# Verify MySQL is listening on correct port
docker exec mysql netstat -tuln | grep 3306
# Test connection with different approach
docker exec mysql mysql -uroot -ppassword -e "SELECT 'MySQL is working';"
ClickHouse Access Issues
# Check ClickHouse logs
docker logs clickhouse --tail 50
# Test without authentication
curl http://localhost:8123/ping
# Check ClickHouse configuration
docker exec clickhouse cat /etc/clickhouse-server/users.xml
LogStash Pipeline Issues
# Check LogStash pipeline configuration
docker exec logstash cat /usr/share/logstash/pipeline/logstash.conf
# View LogStash pipeline stats
curl -s http://localhost:9600/_node/stats/pipelines | jq
# Test JDBC connectivity manually
docker exec logstash java -cp /usr/share/logstash/drivers/mysql-connector-java.jar \
com.mysql.cj.jdbc.Driver "jdbc:mysql://mysql:3306/adventureworks"
Next Steps
Once all verifications pass:
LogStash Configuration - Customize data ingestion pipelines
DBT Getting Started - Build your first data transformations
Troubleshooting Guide - Reference for resolving issues
Verification Checklist
Core Services ✅
Data Pipeline ✅
Performance ✅
Access ✅
Your DataSuite ETL system is now verified and ready for development!
Last updated
Was this helpful?