Optimizing SQL Queries: Best Practices for Performance
Slow database queries are a primary bottleneck in application performance. As datasets grow, inefficient SQL that previously executed in milliseconds can suddenly take seconds or minutes. Optimizing SQL requires an understanding of how the database engine executes queries and accesses data on disk.
Here are core best practices for writing high-performance SQL.
1. Avoid SELECT *
Fetching all columns using SELECT * forces the database to read and transmit data you may not need. This wastes disk I/O, memory, and network bandwidth. Furthermore, it prevents the database engine from utilizing "Index-Only Scans." If you specify exactly the columns you need, and those columns are entirely contained within an index, the database can fulfill the query without ever touching the actual table data.
Bad: SELECT * FROM users WHERE status = 'active';
Good: SELECT id, email, created_at FROM users WHERE status = 'active';
2. Use the EXPLAIN Command
Before optimizing a query, you must understand how the database executes it. Prefixing your query with EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) provides the execution plan. It reveals whether the database is using indexes, performing full table scans, or executing costly operations like hash aggregates or nested loop joins.
If you see a "Seq Scan" (Sequential Scan) on a table with millions of rows, that is your primary target for indexing or query refactoring.
3. Beware of the N+1 Query Problem
This is a common issue introduced by ORMs (Object-Relational Mappers). It occurs when the application executes one query to fetch N parent records, and then executes N additional queries to fetch the related child records for each parent.
The N+1 approach:
SELECT * FROM authors;(Returns 100 authors)SELECT * FROM books WHERE author_id = 1;- ... (Repeated 100 times)
The optimized approach: Use a JOIN or a WHERE IN clause to fetch all data in a single round trip.
SELECT authors.name, books.title FROM authors JOIN books ON authors.id = books.author_id;
4. Optimize LIKE Queries
Using the LIKE operator with leading wildcards (%term) forces a full table scan because the database cannot utilize a standard B-Tree index to find matches that start with unknown characters.
Slow: SELECT * FROM products WHERE name LIKE '%phone';
Fast: SELECT * FROM products WHERE name LIKE 'phone%'; (Can use an index)
If you need robust full-text search capabilities, do not rely on LIKE. Use native full-text search features (like PostgreSQL's tsvector), or dedicated search engines like Elasticsearch.
5. Be Careful with Functions in WHERE Clauses
Applying a function to a column in a WHERE clause often prevents the database from using an index on that column. This is known as making the predicate "non-sargable" (Search Argument Able).
Non-sargable (Ignores index): SELECT * FROM events WHERE YEAR(created_at) = 2024;
Sargable (Uses index): SELECT * FROM events WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
If you frequently need to filter by a function's result, consider creating an expression index (or function-based index) if your RDBMS supports it.