# Prerequisites

Before setting up the DataSuite ETL system, ensure your development environment meets the following requirements.

## System Requirements

### Hardware Requirements

* **RAM**: 8GB minimum, 16GB recommended for full stack
* **Storage**: 20GB free disk space for containers and data
* **CPU**: 4+ cores recommended for optimal performance
* **Network**: Stable internet connection for downloading container images

### Operating System Support

* **Linux**: Ubuntu 20.04+, CentOS 8+, RHEL 8+ (recommended)
* **macOS**: macOS 10.15+ with Intel or Apple Silicon
* **Windows**: Windows 10/11 with WSL2 enabled

## Required Software

### Docker Platform

**Docker Engine**: Version 20.10 or later

```bash
# Check Docker version
docker --version

# Expected output: Docker version 20.10.x or later
```

**Docker Compose**: Version 2.0 or later

```bash
# Check Docker Compose version
docker-compose --version

# Expected output: Docker Compose version 2.x.x or later
```

**Installation Resources**:

* [Docker Desktop for Mac/Windows](https://www.docker.com/products/docker-desktop)
* [Docker Engine for Linux](https://docs.docker.com/engine/install/)

### Git Version Control

**Git**: Version 2.30 or later

```bash
# Check Git version
git --version

# Expected output: git version 2.30.x or later
```

## Network Configuration

### Port Requirements

The following ports must be available on your system:

| Service           | Port | Protocol | Purpose              |
| ----------------- | ---- | -------- | -------------------- |
| MySQL             | 3306 | TCP      | Database connections |
| ClickHouse HTTP   | 8123 | TCP      | Query interface      |
| ClickHouse Native | 9000 | TCP      | Native protocol      |
| LogStash          | 5044 | TCP      | Data ingestion       |
| Airflow Web UI    | 8080 | TCP      | Workflow management  |

**Port Conflict Check**:

```bash
# Check if ports are in use (Linux/macOS)
netstat -tulpn | grep -E ':(3306|8123|9000|5044|8080)'

# If any ports show output, they're already in use
```

### Firewall Configuration

* Allow inbound connections on required ports for local development
* Configure Docker daemon to access external registries
* Ensure Docker containers can communicate with each other

## Development Tools (Optional but Recommended)

### Code Editor

* **VS Code** with SQL and Docker extensions
* **IntelliJ IDEA** with database plugins
* **Vim/Emacs** with SQL syntax highlighting

### Database Clients

* **DBeaver** - Universal database client
* **MySQL Workbench** - MySQL-specific client
* **ClickHouse Client** - Native ClickHouse CLI

### Terminal/Shell

* **Bash** or **Zsh** with command completion
* **Windows PowerShell** or **WSL2** on Windows

## Validation Checklist

Run through this checklist to ensure your environment is ready:

### ✅ Docker Validation

```bash
# Test Docker installation
docker run hello-world

# Expected: "Hello from Docker!" message
```

### ✅ Docker Compose Validation

```bash
# Test Docker Compose
docker-compose --version

# Expected: Version 2.x.x or later
```

### ✅ Resource Validation

```bash
# Check available memory (Linux/macOS)
free -h  # Linux
vm_stat | grep free  # macOS

# Check available disk space
df -h
```

### ✅ Network Validation

```bash
# Test internet connectivity for image downloads
docker pull hello-world

# Test local network (should show Docker bridge)
docker network ls
```

## Common Issues and Solutions

### Issue: Docker Permission Denied

**Symptoms**: `permission denied while trying to connect to the Docker daemon socket`

**Solutions**:

```bash
# Linux: Add user to docker group
sudo usermod -aG docker $USER
newgrp docker

# Or run with sudo (not recommended for development)
sudo docker --version
```

### Issue: Port Already in Use

**Symptoms**: `port is already allocated` errors during startup

**Solutions**:

```bash
# Find process using port (Linux/macOS)
lsof -i :3306

# Kill process if safe to do so
kill -9 <PID>

# Or modify docker-compose.yml to use different ports
```

### Issue: Insufficient Memory

**Symptoms**: Containers exit with OOMKilled status

**Solutions**:

* Increase Docker Desktop memory allocation (Mac/Windows)
* Close unnecessary applications
* Reduce number of concurrent containers during development

### Issue: WSL2 Configuration (Windows)

**Symptoms**: Docker commands not found or slow performance

**Solutions**:

```bash
# Enable WSL2 integration in Docker Desktop
# Increase WSL2 memory limit in .wslconfig file
echo "[wsl2]" > ~/.wslconfig
echo "memory=8GB" >> ~/.wslconfig
```

## Performance Optimization

### Docker Configuration

```json
// Docker Desktop settings (Mac/Windows)
{
  "cpus": 4,
  "memory": 8192,
  "disk": {
    "size": 51539607552
  }
}
```

### System Optimization

```bash
# Increase file watchers (Linux)
echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf

# Increase virtual memory (Linux)
echo vm.max_map_count=262144 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
```

## Next Steps

Once your prerequisites are met:

1. [**Quick Start Setup**](/data-platform/index/setup/docker-compose-setup.md) - Get running fast with Docker Compose
2. [**Advanced Setup**](/data-platform/index/setup/individual-containers.md) - Custom configuration with individual containers
3. [**Service Verification**](/data-platform/index/setup/verification.md) - Confirm everything is working correctly

## Getting Help

If you encounter issues during prerequisite setup:

* **Docker Issues**: Check [Docker documentation](https://docs.docker.com/get-started/overview/)
* **System Issues**: Consult your operating system documentation
* **Network Issues**: Check firewall and antivirus settings
* **Hardware Issues**: Verify system meets minimum requirements

The setup process becomes much smoother when prerequisites are properly configured, so take time to validate each requirement before proceeding.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasuite.vn/data-platform/index/setup/prerequisites.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
