Technical Explaination Made Simple: Performance Optimization 01: SQL Optimizaiton without materializing data

1️ Use Proper Indexing (Most Important)

Indexes allow the database to avoid full table scans.

Create indexes on:

JOIN keys
WHERE clause columns
GROUP BY columns
ORDER BY columns

Example:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

CREATE INDEX idx_customer_country ON customer(country);

Key idea:

Indexes let the database filter and join data efficiently without copying or storing anything.

2️.Rewrite Subqueries as JOINs or EXISTS

Avoid IN (subquery) and correlated subqueries when possible.

❌ Slow

SELECT * FROM orders

WHERE customer_id IN (SELECT id FROM customers WHERE country='US');

✅ Faster (JOIN)

SELECT o.*

FROM orders o

JOIN customers c ON o.customer_id = c.id

WHERE c.country='US';

✅ Also fast (EXISTS)

SELECT *

FROM orders o

WHERE EXISTS (

SELECT 1 FROM customers c

WHERE c.id = o.customer_id AND c.country='US'

);

3️.Choose the Right JOIN Type

Unnecessary join types can degrade performance.

Replace:

LEFT JOIN → INNER JOIN (if possible)
FULL OUTER JOIN → split logic into UNION ALL
CROSS JOIN → avoid unless needed

Fewer rows processed = faster queries.

4️.Push Filters Down Early (Predicate Pushdown)

Apply filters on the smallest dataset first.

❌ Slow

SELECT ...

FROM big_table b

JOIN small_table s ON ...

WHERE s.type = 'X';

✅ Fast

Move predicate to small table before join:

SELECT ...

FROM big_table b

JOIN (SELECT * FROM small_table WHERE type='X') s ON ...

This reduces join workload without materializing the data.

5️.Avoid Functions on Indexed Columns

This blocks index usage.

❌ Bad

WHERE DATE(created_at) = '2024-01-01'

✅ Good

WHERE created_at >= '2024-01-01'

AND created_at < '2024-01-02'

6️.Use Covering Indexes

A covering index contains all columns needed, so the DB doesn't fetch the table.

Example query:

SELECT amount, created_at

FROM orders

WHERE customer_id = 100;

Create covering index:

CREATE INDEX idx_orders_cover ON orders(customer_id, created_at, amount);

The DB can serve the entire query from the index only
→ faster, no temp storing.

7️⃣ **Avoid SELECT ***

Only select columns you need.

❌ Bad

SELECT *

FROM orders o

JOIN customers c ON ...

✅ Good

SELECT o.id, o.amount, c.name

FROM orders o

JOIN customers c ON ...

Less data scanned + less data transferred.

8️.Use LIMIT, WINDOWING, and Pagination

Avoid scanning large datasets.

Example Pagination:

SELECT * FROM orders

ORDER BY id

LIMIT 50 OFFSET 0;

Avoid OFFSET for large pages:

Use keyset pagination:

SELECT *

FROM orders

WHERE id > last_seen_id

ORDER BY id

LIMIT 50;

9️⃣ Normalize Query Logic (No Redundant Operations)

Avoid repeating the same subquery multiple times.

❌ Bad

SELECT (SELECT price FROM products WHERE id = o.product_id),

(SELECT category FROM products WHERE id = o.product_id)

FROM orders o;

✅ Good

SELECT p.price, p.category

FROM orders o

JOIN products p ON p.id = o.product_id;

10. Use Database-specific Optimizer Hints (When Needed)

These do not materialize data; they influence execution plan.

Examples:

MySQL: STRAIGHT_JOIN
Oracle: USE_NL, NO_MERGE
SQL Server: OPTION (HASH JOIN)
Postgres: SET enable_seqscan=off (temporary)

Only use when the optimizer chooses a poor plan.

1️1.Partitioning (Logical, Not Materializing)

Partitioning does not materialize data; it splits tables for faster scanning.

Use partitioning on:

date columns
high-cardinality keys

Improves:

scanning
filtering
aggregation

Without storing extra copies of data.

1️2.Use Window Functions Instead of Self-Joins

Window functions compute aggregates without extra joins.

❌ Slow

SELECT o.*,

(SELECT SUM(amount) FROM orders WHERE customer_id=o.customer_id)

FROM orders o;

✅ Fast (window)

SELECT o.*,

SUM(amount) OVER (PARTITION BY customer_id) AS customer_total

FROM orders o;

🧠 Summary: Optimization Without Materializing Data

Technique	Benefit
Indexes	Fast filtering and joining
Rewriting subqueries	Reduce scans + better execution plans
Join optimization	Process fewer rows
Predicate pushdown	Filter early
Covering indexes	Avoid table lookups
Avoid functions on indexed columns	Enable index usage
Keyset pagination	Avoid large offsets
Window functions	Avoid redundant joins
Partitioning	Faster scans on large datasets

Technical Explaination Made Simple

Labels

Sunday, 16 November 2025

Performance Optimization 01: SQL Optimizaiton without materializing data

No comments:

Post a Comment

Data Engineering - Client Interview question regarding data collection.

Search This Blog