What is oLTP vs OLAP — the fundamental distinction?

Data Warehousing — Architecture, OLTP vs OLAP: OLTP vs OLAP — the fundamental distinction. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/data-warehouse-concepts

What is data Warehouse architecture — 6-layer pipeline?

Data Warehousing — Architecture, OLTP vs OLAP: Data Warehouse architecture — 6-layer pipeline. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/data-warehouse-concepts

What is practice questions?

Data Warehousing — Architecture, OLTP vs OLAP: Practice questions. Learn more in the LumiChats AI Glossary at https://lumichats.com/glossary/data-warehouse-concepts

Data Warehousing — Architecture, OLTP vs OLAP

A Data Warehouse (DW) is a central repository of integrated, subject-oriented, time-variant, non-volatile data collected from multiple operational systems. Unlike OLTP databases optimized for transactions, data warehouses are optimized for analytical queries (OLAP) over large historical datasets. Modern examples include Amazon Redshift, Google BigQuery, Snowflake, and Databricks. Data warehousing is the foundation of business intelligence, reporting, and machine learning pipelines.

Storing and analyzing historical data at scale for business intelligence.

Category: Data Warehousing

Real-life analogy: The company memory bank

Your operational databases (CRM, ERP, e-commerce) are like the busy front office — constantly processing today's transactions. At day's end, key information is copied into a secure archive — the data warehouse. The warehouse never changes past records, combines data from all systems in one place, and is optimized for questions like 'what was our revenue by region over the past 5 years?' — impossible to answer efficiently from live transactional systems.

OLTP vs OLAP — the fundamental distinction

Property	OLTP (Transactional)	OLAP (Analytical)
Purpose	Day-to-day business operations	Analysis, reporting, decision-making
Operations	INSERT, UPDATE, DELETE (many writes)	SELECT with complex aggregations (read-only)
Data volume	Current data only (GB scale)	Historical data — years (TB to PB scale)
Query type	Simple, fast, predefined (ms)	Complex, long-running (seconds to minutes)
Tables	Normalised (3NF/BCNF) — avoid redundancy	Denormalized (star/snowflake) — optimize reads
Users	Many concurrent users (1000s)	Few analytical users (10s)
Examples	MySQL, PostgreSQL, Oracle	BigQuery, Redshift, Snowflake, Synapse
Update frequency	Real-time (every transaction)	Batch (nightly/weekly ETL loads)

Data Warehouse architecture — 6-layer pipeline

Source systems: Operational databases (CRM, ERP, web analytics, IoT sensors, flat files). Data is raw, inconsistent, in different formats.
ETL Layer (Extract, Transform, Load): Extract from sources, clean/transform (standardize formats, handle NULLs, deduplicate), load into warehouse. Modern variant: ELT — extract and load raw first, transform inside the warehouse using SQL.
Staging Area: Temporary holding area where data is cleaned before loading into the warehouse. Not exposed to end users.
Data Warehouse: Central repository organised in dimensional model (star/snowflake schema). Contains historical, integrated, subject-oriented data.
Data Marts: Subject-specific subsets (Sales Mart, Finance Mart, HR Mart). Faster queries for specific departments.
Presentation Layer: BI tools (Tableau, Power BI, Looker), SQL clients, ML pipelines, analytics notebooks.

-- Revenue performance by region and product — multi-year analysis
-- This query joins 4 tables and aggregates millions of rows in seconds on Redshift/BigQuery
SELECT
    d_time.Year,
    d_time.Quarter,
    d_region.RegionName,
    d_product.ProductCategory,
    SUM(f.Revenue)                   AS TotalRevenue,
    SUM(f.UnitsSold)                 AS UnitsSold,
    ROUND(AVG(f.Revenue), 2)         AS AvgRevenue,
    SUM(f.Revenue)
      / SUM(SUM(f.Revenue)) OVER (PARTITION BY d_time.Year)
                                     AS YearlyRevenueShare,
    SUM(f.Revenue)
      - LAG(SUM(f.Revenue)) OVER (PARTITION BY d_region.RegionName
                                  ORDER BY d_time.Year, d_time.Quarter)
                                     AS RevenueGrowthVsPrevQuarter
FROM fact_sales f
JOIN dim_time    d_time    ON f.DateKey    = d_time.DateKey
JOIN dim_region  d_region  ON f.RegionKey  = d_region.RegionKey
JOIN dim_product d_product ON f.ProductKey = d_product.ProductKey
WHERE d_time.Year BETWEEN 2020 AND 2024
GROUP BY d_time.Year, d_time.Quarter, d_region.RegionName, d_product.ProductCategory
HAVING SUM(f.Revenue) > 50000
ORDER BY d_time.Year, d_time.Quarter, TotalRevenue DESC;

Kimball vs Inmon architecture: Inmon (top-down): Build one enterprise-wide normalized DW first, derive data marts from it. Single source of truth — complex and expensive to build initially. Kimball (bottom-up): Build dimensional data marts (star schemas) first, integrate them via a data warehouse bus. Faster to deliver value. Modern data platforms (Lakehouse) blend both: store raw data in a data lake, model with dimensional schemas on top.

Practice questions

Why is an OLTP database not suitable for analytical queries? (Answer: OLTP is normalized (many tables, many JOINs), optimized for single-row operations. Analytical queries aggregate millions of rows across many tables — catastrophically slow on row-store OLTP with full sequential scans.)
Four characteristics of a data warehouse per Bill Inmon: (Answer: Subject-oriented, Integrated, Time-variant, Non-volatile. Data is organised around business subjects, unified from multiple sources, preserves historical snapshots, and never deleted or updated once loaded.)
Difference between data warehouse and data lake: (Answer: Warehouse: structured, processed, schema-on-write, SQL analytics. Lake: raw, unprocessed, schema-on-read, any format. Lakehouse combines both.)
What is a data mart? (Answer: A subject-specific subset of the warehouse serving one department — Sales Mart, Finance Mart. Faster for specific queries; warehouse provides enterprise-wide single source of truth.)
Purpose of the staging area in DW architecture: (Answer: Temporary ETL work area — data extracted here, cleaned (handle NULLs, fix formats, deduplicate), validated before loading into the actual warehouse. Protects warehouse from bad data.)

LumiChats can design a data warehouse schema for any business, write ETL pipeline code in Python/SQL, and generate analytical SQL for BigQuery, Redshift, or Snowflake. Describe your data sources and reporting needs in plain English.

Property

OLTP (Transactional)

OLAP (Analytical)

Purpose

Day-to-day business operations

Analysis, reporting, decision-making

Operations

INSERT, UPDATE, DELETE (many writes)

SELECT with complex aggregations (read-only)

Data volume

Current data only (GB scale)

Historical data — years (TB to PB scale)

Query type

Simple, fast, predefined (ms)

Complex, long-running (seconds to minutes)

Tables

Normalised (3NF/BCNF) — avoid redundancy

Denormalized (star/snowflake) — optimize reads

Users

Many concurrent users (1000s)

Few analytical users (10s)

Examples

MySQL, PostgreSQL, Oracle

BigQuery, Redshift, Snowflake, Synapse

Update frequency

Real-time (every transaction)

Batch (nightly/weekly ETL loads)

-- Revenue performance by region and product — multi-year analysis -- This query joins 4 tables and aggregates millions of rows in seconds on Redshift/BigQuery SELECT d_time.Year, d_time.Quarter, d_region.RegionName, d_product.ProductCategory, SUM(f.Revenue) AS TotalRevenue, SUM(f.UnitsSold) AS UnitsSold, ROUND(AVG(f.Revenue), 2) AS AvgRevenue, SUM(f.Revenue) / SUM(SUM(f.Revenue)) OVER (PARTITION BY d_time.Year) AS YearlyRevenueShare, SUM(f.Revenue) - LAG(SUM(f.Revenue)) OVER (PARTITION BY d_region.RegionName ORDER BY d_time.Year, d_time.Quarter) AS RevenueGrowthVsPrevQuarter FROM fact_sales f JOIN dim_time d_time ON f.DateKey = d_time.DateKey JOIN dim_region d_region ON f.RegionKey = d_region.RegionKey JOIN dim_product d_product ON f.ProductKey = d_product.ProductKey WHERE d_time.Year BETWEEN 2020 AND 2024 GROUP BY d_time.Year, d_time.Quarter, d_region.RegionName, d_product.ProductCategory HAVING SUM(f.Revenue) > 50000 ORDER BY d_time.Year, d_time.Quarter, TotalRevenue DESC;

Data Warehousing — Architecture, OLTP vs OLAP

Real-life analogy: The company memory bank

OLTP vs OLAP — the fundamental distinction

Data Warehouse architecture — 6-layer pipeline

Practice questions

Data Warehousing — Architecture, OLTP vs OLAP

Real-life analogy: The company memory bank

OLTP vs OLAP — the fundamental distinction

Data Warehouse architecture — 6-layer pipeline

Practice questions

Practice what you just learned

Related Terms