Glossary/Temporary Tables, Bulk Operations & Data Import/Export
SQL & Databases

Temporary Tables, Bulk Operations & Data Import/Export

Working with large datasets efficiently — staging, loading, and exporting data.


Definition

Temporary tables exist only for the duration of a session or transaction, storing intermediate results for complex multi-step queries. Bulk operations — COPY, BULK INSERT, INSERT SELECT — load or extract millions of rows efficiently, bypassing row-by-row processing overhead. Understanding these techniques is essential for ETL pipelines, data migration, reporting systems, and any workflow involving large volumes of data.

Temporary tables

Temporary tables and table variables for staged processing

-- TEMPORARY TABLE: exists for the current session (auto-dropped on disconnect)
CREATE TEMPORARY TABLE temp_high_earners AS
SELECT EmpID, Name, Salary, DeptID
FROM Employee
WHERE Salary > 80000;

-- Use the temp table in subsequent queries
SELECT d.DeptName, COUNT(*) AS HighEarnerCount, AVG(t.Salary) AS AvgSalary
FROM temp_high_earners t
JOIN Department d ON t.DeptID = d.DeptID
GROUP BY d.DeptName;

-- Add indexes to temp tables for better performance
CREATE INDEX idx_temp_dept ON temp_high_earners(DeptID);

-- GLOBAL TEMPORARY TABLE (Oracle/SQL Server): persists across sessions
CREATE GLOBAL TEMPORARY TABLE temp_session_data (
    UserID  INT,
    Data    TEXT
) ON COMMIT DELETE ROWS;   -- Oracle: delete rows on COMMIT

-- SQL Server: # prefix = session temp, ## prefix = global temp
CREATE TABLE #temp_results (ID INT, Name VARCHAR(100));
SELECT * INTO #temp_from_select FROM Employee WHERE DeptID = 1;

-- CTE vs Temp Table — when to use each:
-- CTE: single-use, inline, no persistence needed, part of one query
-- Temp table: reused multiple times, need indexes, complex multi-step process

-- DROP TEMPORARY TABLE
DROP TABLE IF EXISTS temp_high_earners;

-- Multi-step ETL using temp tables
CREATE TEMP TABLE staging_orders AS SELECT * FROM raw_orders_import;
-- Step 1: Clean
UPDATE staging_orders SET Amount = ABS(Amount) WHERE Amount < 0;
DELETE FROM staging_orders WHERE CustomerID IS NULL OR Amount = 0;
-- Step 2: Enrich
ALTER TABLE staging_orders ADD COLUMN CustomerName VARCHAR(100);
UPDATE staging_orders s SET CustomerName = c.Name
FROM Customer c WHERE c.CustomerID = s.CustomerID;
-- Step 3: Load into final table
INSERT INTO Orders SELECT * FROM staging_orders WHERE Amount > 0;

Bulk INSERT and COPY

Efficient bulk data loading from files and queries

-- PostgreSQL COPY: fastest way to bulk load data
-- Load from CSV file (server-side file path)
COPY Employee (EmpID, Name, Email, Salary, DeptID)
FROM '/data/employees.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',', QUOTE '"', NULL '\N');

-- Load from stdin (for psql or application code)
COPY Employee FROM stdin WITH (FORMAT csv, HEADER true);

-- Client-side COPY with copy (psql command)
copy Employee FROM 'local_employees.csv' CSV HEADER

-- Export to CSV
COPY (SELECT * FROM Employee WHERE Status = 'Active')
TO '/exports/active_employees.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

-- MySQL LOAD DATA (bulk load)
LOAD DATA INFILE '/data/employees.csv'
INTO TABLE Employee
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '
'
IGNORE 1 ROWS
(EmpID, Name, @Email, Salary, DeptID)
SET Email = LOWER(@Email);   -- Transform during load

-- SQL Server BULK INSERT
BULK INSERT Employee
FROM 'C:dataemployees.csv'
WITH (FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '
',
      MAXERRORS = 100, BATCHSIZE = 1000);

-- INSERT ... SELECT (bulk copy between tables)
INSERT INTO Employee_Archive
SELECT * FROM Employee WHERE Status = 'Terminated' AND LeaveDate < '2020-01-01';

-- INSERT ... ON CONFLICT (PostgreSQL UPSERT — bulk safe)
INSERT INTO Employee (EmpID, Name, Salary, DeptID)
SELECT EmpID, Name, Salary, DeptID FROM staging_employees
ON CONFLICT (EmpID) DO UPDATE SET
    Name   = EXCLUDED.Name,
    Salary = EXCLUDED.Salary,
    DeptID = EXCLUDED.DeptID;

-- Performance tips for bulk operations:
-- 1. Disable indexes before load, rebuild after
ALTER TABLE Employee DISABLE TRIGGER ALL;   -- Disable FK checks temporarily
COPY Employee FROM '/data/big_file.csv' CSV HEADER;
ALTER TABLE Employee ENABLE TRIGGER ALL;
ANALYZE Employee;   -- Update statistics after bulk load

-- 2. Use transactions for large inserts (commit in batches)
-- 3. Increase work_mem and maintenance_work_mem for index builds
-- 4. Consider UNLOGGED table during load (no WAL → faster, but no crash recovery)

Exporting data — JSON, CSV, and reports

Generating JSON and formatted output from SQL

-- Generate JSON output directly from SQL (for REST APIs)
SELECT json_agg(row_to_json(emp))
FROM (
    SELECT EmpID AS id, Name AS name, Salary AS salary,
           d.DeptName AS department
    FROM Employee emp
    JOIN Department d ON emp.DeptID = d.DeptID
    WHERE emp.Status = 'Active'
    ORDER BY emp.Salary DESC
    LIMIT 100
) emp;

-- Nested JSON (employee with their projects)
SELECT json_build_object(
    'employee', json_build_object('id', e.EmpID, 'name', e.Name),
    'projects', (
        SELECT json_agg(json_build_object('id', p.ProjID, 'name', p.ProjName))
        FROM Project p
        JOIN WorksOn w ON p.ProjID = w.ProjID AND w.EmpID = e.EmpID
    )
) AS employee_with_projects
FROM Employee e
WHERE e.DeptID = 1;

-- Export query result to CSV using Python + psycopg2
-- import psycopg2, csv
-- conn = psycopg2.connect("dbname=mydb user=postgres")
-- cur  = conn.cursor()
-- cur.execute("SELECT * FROM Employee")
-- with open('employees.csv', 'w', newline='') as f:
--     writer = csv.writer(f)
--     writer.writerow([desc[0] for desc in cur.description])  # Header
--     writer.writerows(cur.fetchall())

Practice questions

  1. When would you use a temporary table instead of a CTE? (Answer: Use temp table when: (1) the result is used multiple times in subsequent queries (CTE re-executes each time). (2) You need to add indexes for performance. (3) The intermediate result is very large and materialising it saves repeated computation. (4) Multi-step ETL processing where each step builds on the previous.)
  2. Why is COPY faster than INSERT for bulk loading? (Answer: COPY bypasses the SQL parser, planner, and row-by-row trigger overhead. It uses a streaming binary protocol directly to the storage layer. INSERT goes through full query processing per row. COPY can load millions of rows per second; INSERT per-row is thousands.)
  3. What is an UPSERT and when would you use it? (Answer: UPSERT = INSERT + UPDATE on conflict. Inserts new rows; if a row with the same key already exists, updates it instead. Used in ETL (merge new data with existing), syncing external data sources, idempotent data loading where running the same load twice gives the same result.)
  4. You need to load 100 million rows into a table. What optimisations would you apply? (Answer: (1) Use COPY/BULK INSERT, not INSERT. (2) Disable FK checks and triggers temporarily. (3) Drop non-PK indexes before load, rebuild after. (4) Use an UNLOGGED table if crash recovery is not needed. (5) Load in sorted order matching the clustered index. (6) ANALYZE after load.)
  5. Difference between session-level and transaction-level temporary tables: (Answer: Session-level: table exists until the session ends (disconnect). Data persists across multiple transactions. Transaction-level (ON COMMIT DELETE ROWS in Oracle): data is automatically deleted at COMMIT but the table structure remains. Use transaction-level for per-transaction staging data.)

On LumiChats

LumiChats can write complete ETL pipelines using temp tables and COPY commands, generate the Python code to export SQL results to CSV or JSON, and design bulk loading strategies for large datasets. Describe your data flow and LumiChats builds the pipeline.

Try it free

Try LumiChats for ₹69

39+ AI models. Study Mode with page-locked answers. Agent Mode with code execution. Pay only on days you use it.

Get Started — ₹69/day

Related Terms

4 terms