Dimensional modelling organises data warehouses into fact tables (measurable business events — sales, clicks, orders) and dimension tables (descriptive context — who, what, when, where). The Star Schema: one central fact table surrounded by denormalised dimension tables. The Snowflake Schema: normalises dimension tables into sub-dimensions. Fact Constellation: multiple fact tables sharing dimension tables. These schemas are the foundation of OLAP and business intelligence.
Real-life analogy: The sales receipt
Every receipt captures: WHO bought it (customer), WHAT they bought (product), WHEN (date), WHERE (store), HOW MUCH (revenue, quantity). The receipt is the fact. Customer, Product, Date, Store are dimensions. A fact table = millions of receipts; dimension tables = master data describing each entity on the receipt.
Star Schema — the standard dimensional model
Complete star schema for retail sales analytics
-- DIMENSION TABLES (descriptive context — wide, denormalised)
CREATE TABLE dim_date (
DateKey INT PRIMARY KEY, -- Surrogate key e.g. 20241225
FullDate DATE NOT NULL,
Day SMALLINT, Month SMALLINT,
MonthName VARCHAR(10), Quarter SMALLINT,
Year SMALLINT, DayOfWeek SMALLINT,
DayName VARCHAR(10), IsWeekend BOOLEAN,
IsHoliday BOOLEAN, FiscalYear SMALLINT
);
CREATE TABLE dim_product (
ProductKey INT PRIMARY KEY, -- Surrogate key (stable, compact)
ProductID VARCHAR(20), -- Business/natural key
ProductName VARCHAR(200) NOT NULL,
SubCategory VARCHAR(100),
Category VARCHAR(100),
Brand VARCHAR(100),
UnitCost DECIMAL(10,2),
IsActive BOOLEAN DEFAULT TRUE,
-- SCD Type 2: track historical changes
EffectiveFrom DATE, EffectiveTo DATE, IsCurrent BOOLEAN DEFAULT TRUE
);
CREATE TABLE dim_customer (
CustomerKey INT PRIMARY KEY,
CustomerID VARCHAR(20),
Name VARCHAR(100),
City VARCHAR(100), State VARCHAR(50),
Country VARCHAR(50), Region VARCHAR(50),
Segment VARCHAR(50) -- Premium, Standard, Economy
);
CREATE TABLE dim_store (
StoreKey INT PRIMARY KEY,
StoreID VARCHAR(20),
StoreName VARCHAR(100),
StoreType VARCHAR(50), City VARCHAR(100),
State VARCHAR(50), Region VARCHAR(50)
);
-- FACT TABLE: measurable events (FKs to all dimensions + numeric measures)
CREATE TABLE fact_sales (
SalesID BIGSERIAL PRIMARY KEY,
DateKey INT NOT NULL REFERENCES dim_date(DateKey),
ProductKey INT NOT NULL REFERENCES dim_product(ProductKey),
CustomerKey INT NOT NULL REFERENCES dim_customer(CustomerKey),
StoreKey INT NOT NULL REFERENCES dim_store(StoreKey),
-- MEASURES (the numbers you analyse)
UnitsSold INT NOT NULL,
Revenue DECIMAL(12,2) NOT NULL,
Cost DECIMAL(12,2) NOT NULL,
Profit DECIMAL(12,2) GENERATED ALWAYS AS (Revenue - Cost) STORED,
DiscountAmt DECIMAL(10,2) DEFAULT 0
);
-- Star schema query: revenue by category and quarter
SELECT
p.Category, d.Year, d.Quarter,
SUM(f.Revenue) AS Revenue,
SUM(f.Profit) AS Profit,
SUM(f.UnitsSold) AS Units
FROM fact_sales f
JOIN dim_product p ON f.ProductKey = p.ProductKey
JOIN dim_date d ON f.DateKey = d.DateKey
WHERE p.IsActive = TRUE
GROUP BY p.Category, d.Year, d.Quarter
ORDER BY d.Year, d.Quarter, Revenue DESC;Snowflake Schema vs Star Schema
| Property | Star Schema | Snowflake Schema |
|---|---|---|
| Dimension tables | Denormalised (flat, wide) | Normalised (split into sub-tables) |
| Storage | More (redundancy in dims) | Less (normalised) |
| Query performance | Faster (fewer JOINs) | Slightly slower (more JOINs) |
| ETL complexity | Simpler to load | More complex transformation |
| Recommended for | Most OLAP use cases | Very large dimensions where storage matters |
Snowflake schema — normalised product dimension
-- SNOWFLAKE: dim_product split into sub-dimensions
CREATE TABLE dim_category (CategoryKey INT PK, CategoryName VARCHAR(100), Department VARCHAR(100));
CREATE TABLE dim_subcategory (SubcatKey INT PK, SubcatName VARCHAR(100), CategoryKey INT REFERENCES dim_category);
CREATE TABLE dim_brand (BrandKey INT PK, BrandName VARCHAR(100), Country VARCHAR(50));
-- Product dimension references sub-tables (normalised)
CREATE TABLE dim_product_sf (
ProductKey INT PRIMARY KEY,
ProductName VARCHAR(200),
BrandKey INT REFERENCES dim_brand(BrandKey),
SubcatKey INT REFERENCES dim_subcategory(SubcatKey)
);
-- Snowflake query needs more JOINs
SELECT cat.CategoryName, b.BrandName, SUM(f.Revenue)
FROM fact_sales f
JOIN dim_product_sf p ON f.ProductKey = p.ProductKey
JOIN dim_subcategory s ON p.SubcatKey = s.SubcatKey
JOIN dim_category cat ON s.CategoryKey = cat.CategoryKey
JOIN dim_brand b ON p.BrandKey = b.BrandKey
GROUP BY cat.CategoryName, b.BrandName;SCD — Slowly Changing Dimensions
Dimension data changes over time (customer address, product price, employee department). SCD types: Type 1: Overwrite (no history). Type 2: Add new row with new surrogate key, set EffectiveTo on old row — preserves full history. Type 3: Add Previous_Value column — limited history. SCD Type 2 is most common — preserves dimension state at time each fact was recorded.
Practice questions
- Difference between fact table and dimension table? (Answer: Fact: measurable numeric events (revenue, quantity). Dimension: descriptive context (name, category, date attributes). Facts are long (millions of rows); dimensions are wide (many descriptive columns, fewer rows).)
- Why do dimension tables use surrogate keys instead of natural/business keys? (Answer: Natural keys can change (email changes). Surrogate keys are stable, compact, fast to JOIN, and support SCD Type 2 — add new row for same entity while preserving history.)
- What is a degenerate dimension? (Answer: A dimension attribute stored directly in the fact table with no corresponding dimension table. Examples: OrderNumber, InvoiceID, TransactionID. No descriptive attributes — just the key itself.)
- Star schema has 4 dimension tables connected to 1 fact table. This is different from snowflake schema because: (Answer: In star schema, dimensions are directly connected (denormalised, flat). In snowflake, dimensions reference sub-dimension tables (normalised hierarchy).)
- Why are fact tables long (many rows) and dimension tables wide (many columns)? (Answer: Facts record every business event (every sale, click) — millions per day. Dimensions describe entities (10,000 products, 1M customers) — fewer rows but many descriptive attributes for rich filtering.)
On LumiChats
LumiChats can design a complete star schema for any business domain, generate the full DDL for fact and dimension tables, and write analytical SQL queries. Describe your business process and LumiChats builds the data model.
Try it free