Common Issues
This guide covers frequently encountered problems in the DataSuite ETL pipeline and their solutions.
Service Connection Issues
MySQL Connection Problems
Issue: Access denied for user 'root'@'%'
# Solution: Reset MySQL password
docker exec mysql mysql -uroot -e "ALTER USER 'root'@'%' IDENTIFIED BY 'password';"
docker restart mysql
Issue: Can't connect to MySQL server
# Check if MySQL is running
docker ps | grep mysql
# Check MySQL logs
docker logs mysql --tail 50
# Test connection
docker exec mysql mysqladmin ping -uroot -ppassword
ClickHouse Access Issues
Issue: Authentication failed
# Test connection without auth
curl http://localhost:8123/ping
# Check user configuration
curl -u admin:clickhouse123 http://localhost:8123/ -d "SELECT user(), currentDatabase()"
Issue: Connection refused
# Check if ClickHouse is listening
docker exec clickhouse netstat -tuln | grep 8123
# Check ClickHouse logs
docker logs clickhouse --tail 50
LogStash Pipeline Issues
JDBC Connection Errors
Issue: No suitable driver found
# Verify JDBC driver is mounted
docker exec logstash ls -la /usr/share/logstash/drivers/
# Check LogStash can load the driver
docker exec logstash java -cp /usr/share/logstash/drivers/mysql-connector-java.jar com.mysql.cj.jdbc.Driver
Issue: Pipeline not executing
# Check pipeline configuration
curl -s http://localhost:9600/_node/stats/pipelines | jq
# View LogStash logs
docker logs logstash -f --tail 100
Data Quality Issues
Missing Data
Issue: Records not appearing in ClickHouse
# Check LogStash is reading from MySQL
docker logs logstash | grep "SELECT"
# Verify LogStash output
docker logs logstash | grep "INSERT INTO"
# Check ClickHouse for errors
curl -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM system.errors WHERE value > 0"
Duplicate Records
Issue: Duplicate data in bronze layer
-- Find duplicates in ClickHouse
SELECT
salesorderid,
COUNT(*) as duplicate_count
FROM bronze_layer.sales_orders
GROUP BY salesorderid
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
Solution: Implement deduplication in LogStash
filter {
fingerprint {
source => ["salesorderid", "modifieddate"]
target => "[@metadata][fingerprint]"
}
}
output {
http {
# Use fingerprint to prevent duplicates
message => "INSERT INTO bronze_layer.sales_orders (*) VALUES (...) ON DUPLICATE KEY UPDATE ..."
}
}
Performance Issues
Slow Query Performance
Issue: ClickHouse queries taking too long
-- Check query statistics
SELECT
query,
query_duration_ms,
memory_usage
FROM system.query_log
WHERE query_duration_ms > 1000
ORDER BY query_duration_ms DESC
LIMIT 10;
Solutions:
Add appropriate indexes
Optimize table partitioning
Use PREWHERE instead of WHERE
Consider materialized views for frequent queries
Memory Issues
Issue: Containers running out of memory
# Monitor memory usage
docker stats --no-stream
# Check container limits
docker inspect mysql | grep -i memory
Solutions:
Increase Docker memory allocation
Optimize query batch sizes
Add memory limits to prevent OOM
DBT Issues
Model Compilation Errors
Issue: Compilation Error in model
# Debug DBT configuration
dbt debug
# Compile models to check for errors
dbt compile
# Run specific model
dbt run --models stg_sales_orders
Test Failures
Issue: DBT tests failing
# Run tests with verbose output
dbt test --store-failures
# Check failed test results
SELECT * FROM analytics.dbt_test__audit__unique_fact_sale_sale_key;
Network and Docker Issues
Port Conflicts
Issue: Port already in use
# Find process using port
lsof -i :3306
# Kill process
kill -9 <PID>
# Or change port mapping in docker-compose.yml
Container Communication
Issue: Services can't reach each other
# Check Docker network
docker network inspect datasuite-network
# Test connectivity between containers
docker exec logstash nc -zv mysql 3306
Troubleshooting Workflow
Identify the layer where the issue occurs (source, ingestion, transformation, serving)
Check service health using health check commands
Review logs for error messages and patterns
Test connectivity between services
Validate data at each pipeline stage
Monitor resources (CPU, memory, disk, network)
Getting Help
Check service logs first:
docker logs <container-name>
Use health check endpoints:
/ping
,/health
,/_node/stats
Monitor system resources:
docker stats
Test individual components in isolation
Review configuration files for typos or incorrect values
Most issues can be resolved by systematically checking each component and its connections to other services.
Last updated
Was this helpful?