# Service Verification

After completing your DataSuite ETL setup, use this guide to verify that all services are running correctly and can communicate with each other.

## Quick Verification Commands

Run these commands to quickly check service health:

```bash
# 1. Check MySQL
docker exec mysql mysql -uroot -ppassword -e "SHOW DATABASES;"

# 2. Check ClickHouse  
curl -u admin:clickhouse123 http://localhost:8123/ping

# 3. Check LogStash
curl http://localhost:9600/_node/stats

# 4. Check Airflow (if installed)
curl http://localhost:8080/health
```

**Expected Results:**

* MySQL: Shows list of databases including `adventureworks`
* ClickHouse: Returns `Ok.`
* LogStash: Returns JSON stats
* Airflow: Returns health status JSON

## Detailed Service Verification

### 1. MySQL Database Verification

**Check Database Status:**

```bash
# Verify MySQL is running and responsive
docker exec mysql mysqladmin ping -h localhost -u root -ppassword
```

**Expected Output:** `mysqld is alive`

**Verify AdventureWorks Data:**

```bash
# Check that sample data is loaded
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT 
    table_name,
    table_rows 
FROM information_schema.tables 
WHERE table_schema = 'adventureworks' 
ORDER BY table_name;"
```

**Expected Output:**

```sql
+------------------------+------------+
| table_name             | table_rows |
+------------------------+------------+
| Sales_Customer         |      19820 |
| Sales_SalesOrderDetail |     121317 |
| Sales_SalesOrderHeader |      31465 |
| Sales_SalesTerritory   |         10 |
| Production_Product     |        504 |
+------------------------+------------+
```

**Test Sample Query:**

```bash
# Run sample analytical query
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT 
    YEAR(OrderDate) as order_year,
    COUNT(*) as order_count,
    ROUND(SUM(TotalDue), 2) as total_revenue
FROM Sales.SalesOrderHeader 
GROUP BY YEAR(OrderDate) 
ORDER BY order_year;"
```

### 2. ClickHouse Data Warehouse Verification

**Check Service Health:**

```bash
# Test ClickHouse HTTP interface
curl -u admin:clickhouse123 http://localhost:8123/ping
```

**Expected Output:** `Ok.`

**Verify Database Creation:**

```bash
# List databases
curl -u admin:clickhouse123 http://localhost:8123/ -d "SHOW DATABASES"
```

**Expected Output:**

```sql
INFORMATION_SCHEMA
bronze_layer
default
gold_layer
information_schema
system
```

**Check Table Structure:**

```bash
# Describe bronze layer table
curl -u admin:clickhouse123 http://localhost:8123/ -d "DESCRIBE bronze_layer.sales_orders FORMAT PrettyCompact"
```

**Test Query Performance:**

```bash
# Run performance test query
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    COUNT(*) as record_count,
    MIN(orderdate) as earliest_order,
    MAX(orderdate) as latest_order,
    SUM(totaldue) as total_revenue
FROM bronze_layer.sales_orders FORMAT PrettyCompact"
```

### 3. LogStash Data Pipeline Verification

**Check Pipeline Status:**

```bash
# Get LogStash node information
curl -s http://localhost:9600/_node/stats | jq '.pipelines'
```

**Monitor Pipeline Activity:**

```bash
# Watch LogStash logs for data ingestion
docker logs -f logstash --tail 50
```

**Expected Log Patterns:**

```bash
[INFO] Pipeline started {"pipeline.id"=>"main"}
[INFO] JDBC - Loaded mysql-connector-java
[INFO] JDBC - Connected to mysql successfully
[INFO] Pipeline execution completed
```

**Verify Data Ingestion:**

```bash
# Check if data is being ingested to ClickHouse
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    COUNT(*) as ingested_records,
    MAX(_ingested_at) as last_ingestion_time
FROM bronze_layer.sales_orders FORMAT PrettyCompact"
```

**Test Incremental Loading:**

```bash
# Update a record in MySQL to test incremental sync
docker exec mysql mysql -uroot -ppassword adventureworks -e "
UPDATE Sales.SalesOrderHeader 
SET ModifiedDate = NOW() 
WHERE SalesOrderId = 43659 
LIMIT 1;"

# Wait 60 seconds, then check if LogStash picked up the change
sleep 60
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT COUNT(*) as recent_updates 
FROM bronze_layer.sales_orders 
WHERE _ingested_at > NOW() - INTERVAL 2 MINUTE FORMAT PrettyCompact"
```

### 4. Network Connectivity Verification

**Test Inter-Service Communication:**

```bash
# Test MySQL → ClickHouse connectivity
docker exec mysql nc -zv clickhouse 8123
# Expected: Connection succeeded

# Test LogStash → MySQL connectivity  
docker exec logstash nc -zv mysql 3306
# Expected: Connection succeeded

# Test LogStash → ClickHouse connectivity
docker exec logstash nc -zv clickhouse 8123
# Expected: Connection succeeded
```

**DNS Resolution Test:**

```bash
# Verify service discovery within Docker network
docker exec logstash nslookup mysql
docker exec logstash nslookup clickhouse
```

### 5. Data Quality Verification

**Row Count Reconciliation:**

```bash
# Compare row counts between MySQL and ClickHouse
echo "MySQL Sales Orders:"
docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) FROM Sales.SalesOrderHeader;"

echo "ClickHouse Bronze Layer:"
curl -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM bronze_layer.sales_orders FORMAT TabSeparated"
```

**Data Integrity Checks:**

```bash
# Check for duplicate records
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    salesorderid,
    COUNT(*) as duplicate_count
FROM bronze_layer.sales_orders 
GROUP BY salesorderid 
HAVING COUNT(*) > 1 
ORDER BY duplicate_count DESC 
LIMIT 10 FORMAT PrettyCompact"
```

**Sample Data Verification:**

```bash
# Compare sample records between source and destination
echo "MySQL Sample (first 3 records):"
docker exec mysql mysql -uroot -ppassword adventureworks -e "
SELECT SalesOrderId, CustomerId, OrderDate, TotalDue 
FROM Sales.SalesOrderHeader 
ORDER BY SalesOrderId 
LIMIT 3;"

echo "ClickHouse Sample (first 3 records):"
curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT salesorderid, customerid, orderdate, totaldue 
FROM bronze_layer.sales_orders 
ORDER BY salesorderid 
LIMIT 3 FORMAT PrettyCompact"
```

## Performance Verification

### Query Response Time Testing

**ClickHouse Performance Test:**

```bash
# Test analytical query performance
time curl -u admin:clickhouse123 http://localhost:8123/ -d "
SELECT 
    toYear(orderdate) as year,
    COUNT(*) as orders,
    ROUND(SUM(totaldue), 2) as revenue,
    ROUND(AVG(totaldue), 2) as avg_order_value
FROM bronze_layer.sales_orders 
GROUP BY toYear(orderdate) 
ORDER BY year FORMAT PrettyCompact"
```

**Expected Response Time:** < 1 second for typical analytical queries

### Resource Usage Monitoring

**Container Resource Usage:**

```bash
# Monitor CPU and memory usage
docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
```

**Expected Resource Usage:**

* MySQL: 5-10% CPU, 200-500MB memory
* ClickHouse: 10-20% CPU, 1-2GB memory
* LogStash: 5-15% CPU, 512MB-1GB memory

## Web Interface Verification

### ClickHouse Play Interface

1. Navigate to <http://localhost:8123/play>
2. Login with username: `admin`, password: `clickhouse123`
3. Run test query:

```sql
SELECT 
    'ClickHouse is working!' as status,
    version() as version,
    now() as current_time
```

### Airflow Web Interface (if installed)

1. Navigate to <http://localhost:8080>
2. Login with username: `admin`, password: `admin`
3. Verify DAGs are visible and scheduler is running

## Automated Verification Script

Create `scripts/verify-installation.sh` for automated checks:

```bash
#!/bin/bash

echo "🔍 DataSuite ETL Installation Verification"
echo "=========================================="

ERRORS=0

# Function to check service
check_service() {
    local service=$1
    local check_command=$2
    local expected_pattern=$3
    
    echo -n "Checking $service... "
    if eval "$check_command" | grep -q "$expected_pattern"; then
        echo "✅ PASS"
    else
        echo "❌ FAIL"
        ((ERRORS++))
    fi
}

# MySQL checks
check_service "MySQL Connection" \
    "docker exec mysql mysqladmin ping -h localhost -u root -ppassword 2>/dev/null" \
    "mysqld is alive"

check_service "AdventureWorks Database" \
    "docker exec mysql mysql -uroot -ppassword -e 'USE adventureworks; SELECT 1;' 2>/dev/null" \
    "1"

# ClickHouse checks  
check_service "ClickHouse HTTP Interface" \
    "curl -s -u admin:clickhouse123 http://localhost:8123/ping" \
    "Ok"

check_service "Bronze Layer Database" \
    "curl -s -u admin:clickhouse123 http://localhost:8123/ -d 'EXISTS bronze_layer.sales_orders'" \
    "1"

# LogStash checks
check_service "LogStash API" \
    "curl -s http://localhost:9600/_node/stats" \
    "pipeline"

# Network connectivity checks
check_service "LogStash → MySQL" \
    "docker exec logstash nc -zv mysql 3306 2>&1" \
    "succeeded"

check_service "LogStash → ClickHouse" \
    "docker exec logstash nc -zv clickhouse 8123 2>&1" \
    "succeeded"

# Data pipeline checks
echo -n "Checking data ingestion... "
MYSQL_COUNT=$(docker exec mysql mysql -uroot -ppassword adventureworks -e "SELECT COUNT(*) FROM Sales.SalesOrderHeader;" -s -N 2>/dev/null)
CH_COUNT=$(curl -s -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM bronze_layer.sales_orders FORMAT TabSeparated")

if [ "$MYSQL_COUNT" -gt 0 ] && [ "$CH_COUNT" -gt 0 ]; then
    echo "✅ PASS (MySQL: $MYSQL_COUNT, ClickHouse: $CH_COUNT)"
else
    echo "❌ FAIL (MySQL: $MYSQL_COUNT, ClickHouse: $CH_COUNT)"  
    ((ERRORS++))
fi

echo
echo "=========================================="
if [ $ERRORS -eq 0 ]; then
    echo "🎉 All verification checks passed!"
    echo "Your DataSuite ETL installation is working correctly."
else
    echo "⚠️ $ERRORS verification check(s) failed."
    echo "Please review the output above and check your installation."
fi
echo "=========================================="

exit $ERRORS
```

```bash
# Make script executable and run
chmod +x scripts/verify-installation.sh
./scripts/verify-installation.sh
```

## Troubleshooting Failed Verifications

### MySQL Connection Issues

```bash
# Check MySQL container logs
docker logs mysql --tail 50

# Verify MySQL is listening on correct port
docker exec mysql netstat -tuln | grep 3306

# Test connection with different approach
docker exec mysql mysql -uroot -ppassword -e "SELECT 'MySQL is working';"
```

### ClickHouse Access Issues

```bash
# Check ClickHouse logs
docker logs clickhouse --tail 50

# Test without authentication
curl http://localhost:8123/ping

# Check ClickHouse configuration
docker exec clickhouse cat /etc/clickhouse-server/users.xml
```

### LogStash Pipeline Issues

```bash
# Check LogStash pipeline configuration
docker exec logstash cat /usr/share/logstash/pipeline/logstash.conf

# View LogStash pipeline stats
curl -s http://localhost:9600/_node/stats/pipelines | jq

# Test JDBC connectivity manually
docker exec logstash java -cp /usr/share/logstash/drivers/mysql-connector-java.jar \
  com.mysql.cj.jdbc.Driver "jdbc:mysql://mysql:3306/adventureworks"
```

## Next Steps

Once all verifications pass:

1. [**LogStash Configuration**](/data-platform/index/development/logstash-configuration.md) - Customize data ingestion pipelines
2. [**DBT Getting Started**](/data-platform/index/development/dbt-getting-started.md) - Build your first data transformations
3. [**Troubleshooting Guide**](/data-platform/index/troubleshooting/common-issues.md) - Reference for resolving issues

## Verification Checklist

**Core Services** ✅

* [ ] MySQL database accessible and contains AdventureWorks data
* [ ] ClickHouse responsive with bronze\_layer and gold\_layer databases
* [ ] LogStash pipeline running and ingesting data
* [ ] Network connectivity between all services

**Data Pipeline** ✅

* [ ] Data flowing from MySQL to ClickHouse bronze layer
* [ ] Row counts match between source and destination
* [ ] Incremental updates working correctly
* [ ] No duplicate records in destination

**Performance** ✅

* [ ] Query response times under 1 second for simple queries
* [ ] Resource usage within expected ranges
* [ ] No memory leaks or excessive CPU usage

**Access** ✅

* [ ] Web interfaces accessible (ClickHouse Play, Airflow if installed)
* [ ] Database clients can connect successfully
* [ ] API endpoints responding correctly

Your DataSuite ETL system is now verified and ready for development!


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasuite.vn/data-platform/index/setup/verification.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
