Common Issues

This guide covers frequently encountered problems in the DataSuite ETL pipeline and their solutions.

Service Connection Issues

MySQL Connection Problems

Issue: Access denied for user 'root'@'%'

# Solution: Reset MySQL password
docker exec mysql mysql -uroot -e "ALTER USER 'root'@'%' IDENTIFIED BY 'password';"
docker restart mysql

Issue: Can't connect to MySQL server

# Check if MySQL is running
docker ps | grep mysql

# Check MySQL logs
docker logs mysql --tail 50

# Test connection
docker exec mysql mysqladmin ping -uroot -ppassword

ClickHouse Access Issues

Issue: Authentication failed

Issue: Connection refused

LogStash Pipeline Issues

JDBC Connection Errors

Issue: No suitable driver found

Issue: Pipeline not executing

Data Quality Issues

Missing Data

Issue: Records not appearing in ClickHouse

Duplicate Records

Issue: Duplicate data in bronze layer

Solution: Implement deduplication in LogStash

Performance Issues

Slow Query Performance

Issue: ClickHouse queries taking too long

Solutions:

  • Add appropriate indexes

  • Optimize table partitioning

  • Use PREWHERE instead of WHERE

  • Consider materialized views for frequent queries

Memory Issues

Issue: Containers running out of memory

Solutions:

  • Increase Docker memory allocation

  • Optimize query batch sizes

  • Add memory limits to prevent OOM

DBT Issues

Model Compilation Errors

Issue: Compilation Error in model

Test Failures

Issue: DBT tests failing

Network and Docker Issues

Port Conflicts

Issue: Port already in use

Container Communication

Issue: Services can't reach each other

Troubleshooting Workflow

  1. Identify the layer where the issue occurs (source, ingestion, transformation, serving)

  2. Check service health using health check commands

  3. Review logs for error messages and patterns

  4. Test connectivity between services

  5. Validate data at each pipeline stage

  6. Monitor resources (CPU, memory, disk, network)

Getting Help

  • Check service logs first: docker logs <container-name>

  • Use health check endpoints: /ping, /health, /_node/stats

  • Monitor system resources: docker stats

  • Test individual components in isolation

  • Review configuration files for typos or incorrect values

Most issues can be resolved by systematically checking each component and its connections to other services.

Last updated