This project illustrates creating a multidemensional data store from which we can query to illustrate the concept of dimensions and metrics.
Cubing concepts (such as slicing, dicing, and drilldowns) are still widely used, although pre-computation of cubes may not be required anymore. Snowflake, Power Bi, Tableau and more can compute as needed using the most up-to-date information sources.
Ensure that the OLAP scripts you run are compatible with the schema of your data warehouse. This example assumes a specific schema (detailed below).
- Contains information about customers.
- Columns:
- customer_id: Unique identifier for each customer.
- name: Name of the customer.
- region: Customer's region (e.g., North, East, West, South).
- join_date: Date the customer joined.
Example Rows:
customer_id,name,region,join_date
1001,William White,East,2021-11-11
1002,Wylie Coyote,East,2023-02-14
1003,Dan Brown,West,2023-10-19
- Contains information about products sold.
- Columns:
- product_id: Unique identifier for each product.
- name: Name of the product.
- category: Product category (e.g., Electronics, Clothing).
- unit_price_usd: Price of a single unit (in USD).
Example Rows:
product_id,name,category,unit_price_usd
101,laptop,Electronics,793.12
102,hoodie,Clothing,39.10
103,cable,Electronics,22.76
- Contains transactional data for each sale.
- Columns:
- sale_id: Unique identifier for each transaction.
- customer_id: ID of the customer who made the purchase.
- product_id: ID of the product sold.
- store_id: ID of the store (additional data example)
- campaign_id: ID of the active marketing campaign (addl data)
- sale_date: Date of the sale.
- sale_amount_usd: Total revenue generated by the transaction (in USD).
Example Rows:
sale_id,customer_id,product_id,sale_date,sale_amount_usd
550,1001,101,2024-01-06,6344.96
551,1002,102,2024-01-06,312.80
552,1003,103,2024-01-16,431.00
This example outputs a multidimensional data set with the following column names (yours will differ).
DayOfWeek,product_id,customer_id,sale_amount_usd_sum,sale_amount_usd_mean,sale_id_count,sale_ids
Friday,101,1001,6344.96,6344.96,1,[582]
Friday,102,1009,312.8,312.8,1,[583]
Friday,104,1008,431.0,431.0,1,[593]
In a VS Code terminal:
- Create & activate .venv
- Install dependencies with requirements.txt
Mac/Linux
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade -r requirements.txt
Windows
py -m venv .venv
.\.venv\Scripts\activate
py -m pip install --upgrade -r requirements.txt
This script:
- Connects to the DW.
- Aggregates data into an OLAP cube based on specified dimensions and metrics.
- Saves the multidimensional dataset (cube) to an intermediate CSV file.
This script:
- Loads the precomputed OLAP cube.
- Analyzes sales data to identify patterns, such as total sales by day of the week.
- Outputs actionable insights and visualizations.
In a VS Code terminal:
- Activate .venv
- Run the script
Mac/Linux
source .venv/bin/activate
python3 scripts/olap_cubing.py
python3 scripts/olap_goal_sales_by_day.py
Windows
.\.venv\Scripts\activate
py scripts\olap_cubing.py
py scripts\olap_goal_sales_by_day.py
Goal: Sales - Low Revenue - by Day of Week
Which day of the week has the least sales: Friday