Final Year Project

Data Warehouse Design
& ETL Optimization

Building a high-performance Star Schema Data Warehouse in SQL Server with an automated Python ETL pipeline and interactive Power BI analytics.

Project Overview

A comprehensive approach to solving enterprise analytical challenges using modern data engineering practices.

The Problem

Traditional OLTP systems are not optimized for complex analysis. Direct reporting leads to performance bottlenecks, data inconsistency, and lack of historical context.

The Solution

Implemented a centralized Data Warehouse with a robust Star Schema and an optimized ETL framework for efficient data processing and visualization.

Optimization

Utilized incremental loading, composite indexing, and query pre-aggregation to achieve up to 70% faster reporting performance.

Project Deliverables

Access the core documentation including research papers, review presentations, and abstracts.

Base Paper

Secure On-Premises ETL Pipeline

View PDF
Review 0

Workflow & Methodology

View Workflow
Review 1

Project Progress PPT

Download PPTX
Abstract

Summary & Results

View Abstract

Architecture & Workflow

Detailed technical breakdown of the data journey from source to dashboard.

1

Extraction Phase

Gathering heterogeneous data from CSV files and SQL Server operational sources using specialized Python connectors.

2

Staging & Transformation

Data cleansing, deduplication, and normalization in a dedicated staging layer. Applying business logic and referential integrity checks.

3

Warehouse Loading

Loading dimensions first, followed by fact tables. Implementing incremental loads to minimize processing time for large datasets.

4

BI Visualization

Connecting Power BI to the optimized Star Schema to generate dynamic, real-time dashboards for KPI tracking.

Technology Stack

The tools and frameworks powering this project.

SQL Server
Python (Pandas)
Power BI
Git / GitHub
T-SQL
On-Premise DWH

Ready to see the code?

Complete source code, SQL implementation scripts, and the Python ETL pipeline are available on the official GitHub repository.

Repository Main Branch