# Common Issues

This guide covers frequently encountered problems in the DataSuite ETL pipeline and their solutions.

## Service Connection Issues

### MySQL Connection Problems

**Issue**: `Access denied for user 'root'@'%'`

```bash
# Solution: Reset MySQL password
docker exec mysql mysql -uroot -e "ALTER USER 'root'@'%' IDENTIFIED BY 'password';"
docker restart mysql
```

**Issue**: `Can't connect to MySQL server`

```bash
# Check if MySQL is running
docker ps | grep mysql

# Check MySQL logs
docker logs mysql --tail 50

# Test connection
docker exec mysql mysqladmin ping -uroot -ppassword
```

### ClickHouse Access Issues

**Issue**: `Authentication failed`

```bash
# Test connection without auth
curl http://localhost:8123/ping

# Check user configuration
curl -u admin:clickhouse123 http://localhost:8123/ -d "SELECT user(), currentDatabase()"
```

**Issue**: `Connection refused`

```bash
# Check if ClickHouse is listening
docker exec clickhouse netstat -tuln | grep 8123

# Check ClickHouse logs
docker logs clickhouse --tail 50
```

## LogStash Pipeline Issues

### JDBC Connection Errors

**Issue**: `No suitable driver found`

```bash
# Verify JDBC driver is mounted
docker exec logstash ls -la /usr/share/logstash/drivers/

# Check LogStash can load the driver
docker exec logstash java -cp /usr/share/logstash/drivers/mysql-connector-java.jar com.mysql.cj.jdbc.Driver
```

**Issue**: `Pipeline not executing`

```bash
# Check pipeline configuration
curl -s http://localhost:9600/_node/stats/pipelines | jq

# View LogStash logs
docker logs logstash -f --tail 100
```

## Data Quality Issues

### Missing Data

**Issue**: Records not appearing in ClickHouse

```bash
# Check LogStash is reading from MySQL
docker logs logstash | grep "SELECT"

# Verify LogStash output
docker logs logstash | grep "INSERT INTO"

# Check ClickHouse for errors
curl -u admin:clickhouse123 http://localhost:8123/ -d "SELECT COUNT(*) FROM system.errors WHERE value > 0"
```

### Duplicate Records

**Issue**: Duplicate data in bronze layer

```sql
-- Find duplicates in ClickHouse
SELECT 
    salesorderid,
    COUNT(*) as duplicate_count
FROM bronze_layer.sales_orders 
GROUP BY salesorderid 
HAVING COUNT(*) > 1 
ORDER BY duplicate_count DESC;
```

**Solution**: Implement deduplication in LogStash

```ruby
filter {
  fingerprint {
    source => ["salesorderid", "modifieddate"]
    target => "[@metadata][fingerprint]"
  }
}

output {
  http {
    # Use fingerprint to prevent duplicates
    message => "INSERT INTO bronze_layer.sales_orders (*) VALUES (...) ON DUPLICATE KEY UPDATE ..."
  }
}
```

## Performance Issues

### Slow Query Performance

**Issue**: ClickHouse queries taking too long

```sql
-- Check query statistics
SELECT 
    query,
    query_duration_ms,
    memory_usage
FROM system.query_log 
WHERE query_duration_ms > 1000 
ORDER BY query_duration_ms DESC 
LIMIT 10;
```

**Solutions**:

* Add appropriate indexes
* Optimize table partitioning
* Use PREWHERE instead of WHERE
* Consider materialized views for frequent queries

### Memory Issues

**Issue**: Containers running out of memory

```bash
# Monitor memory usage
docker stats --no-stream

# Check container limits
docker inspect mysql | grep -i memory
```

**Solutions**:

* Increase Docker memory allocation
* Optimize query batch sizes
* Add memory limits to prevent OOM

## DBT Issues

### Model Compilation Errors

**Issue**: `Compilation Error in model`

```bash
# Debug DBT configuration
dbt debug

# Compile models to check for errors
dbt compile

# Run specific model
dbt run --models stg_sales_orders
```

### Test Failures

**Issue**: DBT tests failing

```bash
# Run tests with verbose output
dbt test --store-failures

# Check failed test results
SELECT * FROM analytics.dbt_test__audit__unique_fact_sale_sale_key;
```

## Network and Docker Issues

### Port Conflicts

**Issue**: `Port already in use`

```bash
# Find process using port
lsof -i :3306

# Kill process
kill -9 <PID>

# Or change port mapping in docker-compose.yml
```

### Container Communication

**Issue**: Services can't reach each other

```bash
# Check Docker network
docker network inspect datasuite-network

# Test connectivity between containers
docker exec logstash nc -zv mysql 3306
```

## Troubleshooting Workflow

1. **Identify the layer** where the issue occurs (source, ingestion, transformation, serving)
2. **Check service health** using health check commands
3. **Review logs** for error messages and patterns
4. **Test connectivity** between services
5. **Validate data** at each pipeline stage
6. **Monitor resources** (CPU, memory, disk, network)

## Getting Help

* Check service logs first: `docker logs <container-name>`
* Use health check endpoints: `/ping`, `/health`, `/_node/stats`
* Monitor system resources: `docker stats`
* Test individual components in isolation
* Review configuration files for typos or incorrect values

Most issues can be resolved by systematically checking each component and its connections to other services.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasuite.vn/data-platform/index/troubleshooting/common-issues.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
