# System Overview

The DataSuite ETL system is a modern data engineering platform that extracts data from operational systems, transforms it using dimensional modeling techniques, and loads it into a high-performance analytics warehouse.

## Architecture Overview

```plant-uml
@startuml
database DataSource
component LogStash
component Airflow {
    component EtlJob
}
database Clickhouse {
    component BronzeLayer
    component GoldLayer
}
DataSource <- LogStash: read and query data from data source
LogStash -> BronzeLayer: push data to clickhouse
EtlJob --> BronzeLayer: read raw data from clickhouse
EtlJob --> GoldLayer: persist data aftere ETL to clickhouse 


@enduml
```

## Data Flow Pattern

The system follows a medallion architecture pattern with three distinct layers:

1. **Source Layer** → Raw operational data from MySQL (AdventureWorks database)
2. **Bronze Layer** → Raw data ingested via LogStash into ClickHouse
3. **Gold Layer** → Transformed dimensional models created by DBT

## Key Design Principles

### 🔄 Event-Driven Processing

* LogStash monitors source systems for changes using timestamp-based incremental loading
* Changes are immediately propagated to the bronze layer
* DBT transformations can run on demand or scheduled basis

### 📊 Dimensional Modeling

* Star schema design optimized for analytical queries
* Fact tables store measurable events (sales transactions)
* Dimension tables provide context (customers, products, dates, territories)

### 🏗️ Layered Architecture

* **Bronze Layer**: Raw, unprocessed data for auditing and reprocessing
* **Gold Layer**: Business-ready dimensional models for analytics
* **Serving Layer**: Optimized views and aggregations for specific use cases

### ⚡ High Performance

* ClickHouse columnar storage for fast analytical queries
* Incremental processing to minimize compute resources
* Parallel processing capabilities across all components

## Technology Stack

| Component               | Technology            | Purpose                                              |
| ----------------------- | --------------------- | ---------------------------------------------------- |
| **Data Source**         | MySQL 8.0             | Operational database with AdventureWorks sample data |
| **Data Ingestion**      | LogStash 8.11         | Real-time data extraction and loading                |
| **Data Warehouse**      | ClickHouse            | High-performance columnar analytics database         |
| **Data Transformation** | DBT Core              | SQL-based transformation and modeling                |
| **Orchestration**       | Apache Airflow        | Workflow scheduling and monitoring                   |
| **Containerization**    | Docker/Docker Compose | Development environment and deployment               |

## System Benefits

### For Data Engineers

* **Rapid Development**: Pre-configured stack reduces setup time
* **Best Practices**: Implements proven patterns and architectures
* **Scalability**: Components can scale independently based on load
* **Maintainability**: Clear separation of concerns and standardized tooling

### For Data Analysts

* **Fast Queries**: ClickHouse delivers sub-second response times
* **Familiar Interface**: Standard SQL queries work across all layers
* **Data Quality**: Built-in testing ensures reliable, accurate data
* **Self-Service**: Well-documented dimensional models enable independent analysis

### for Business Users

* **Real-Time Insights**: Near real-time data availability for decision making
* **Consistent Metrics**: Single source of truth for business definitions
* **Historical Analysis**: Complete data lineage and change tracking
* **Reliable Reporting**: Automated quality checks prevent data issues

## Next Steps

Now that you understand the high-level architecture, explore the detailed component information:

* [**Component Details**](/data-platform/index/architecture/components.md) - Deep dive into each system component
* [**Data Flow**](/data-platform/index/architecture/data-flow.md) - Detailed data movement and transformation process

Once you're comfortable with the architecture, proceed to:

* [**Environment Setup**](/data-platform/index/setup/prerequisites.md) - Get your development environment running


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.datasuite.vn/data-platform/index/architecture/overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
