My 42,000x Slower Postgres: A Dev's 2025 Jobless Secret
Discover how a single Postgres query became 42,000x slower and the steps taken to fix it. A deep dive into EXPLAIN ANALYZE, indexing, and query optimization.
Alex Miller
Senior Software Engineer specializing in backend systems and database performance optimization.
The Slowdown That Ground Everything to a Halt
It started, as most disasters do, with a simple feature request: a new user activity feed. The goal was to show a user's 20 most recent posts and the number of likes for each. It worked perfectly on my local machine. The queries were snappy, the page loaded instantly. I merged the pull request, deployed to production, and went to grab a coffee, feeling accomplished.
That feeling didn't last long. Within minutes, alerts started firing. PagerDuty was screaming. Our application monitoring showed response times skyrocketing, and the user activity page was timing out for everyone. A quick check of our database monitoring dashboard revealed the horror: a single query was consuming massive amounts of CPU and taking over 80 seconds to execute. My simple feature had brought our production database to its knees.
The Culprit: A Deceptively Simple Query
The problem stemmed from what I thought was a straightforward SQL query. Our database schema was nothing unusual. We had a `users` table, a `posts` table with a `user_id` foreign key, and a `likes` table with a `post_id` foreign key.
My initial, naive query looked something like this:
SELECT
p.id,
p.content,
p.created_at,
COUNT(l.id) as like_count
FROM
posts p
LEFT JOIN
likes l ON p.id = l.post_id
WHERE
p.user_id = 12345
GROUP BY
p.id, p.content, p.created_at
ORDER BY
p.created_at DESC
LIMIT 20;
Logically, it seemed correct. Get the posts for a specific user, join the likes to count them, group everything, and then sort by the most recent, taking only the top 20. On my development database with a few hundred posts, it was instantaneous. In production, with millions of posts and tens of millions of likes, it was a catastrophe.
Peeking Under the Hood with EXPLAIN ANALYZE
When a Postgres query is slow, your first and best tool is `EXPLAIN ANALYZE`. It doesn't just tell you what the query planner thinks it will do; it actually executes the query and gives you the real-world plan and timing information.
I ran `EXPLAIN ANALYZE` on the offending query, and the output was a nightmare. Here’s a simplified version of what I saw:
Limit (cost=123456.78..123457.12) (actual time=84123.456..84123.457 rows=20)
-> Sort (cost=123456.78..123458.90) (actual time=84123.455..84123.456 rows=20)
Sort Key: p.created_at DESC
-> HashAggregate (cost=112345.67..112346.78)
Group Key: p.id, p.content, p.created_at
-> Nested Loop Left Join (cost=0.42..101234.56)
-> Seq Scan on posts p (cost=0.00..54321.00) (rows=15000)
Filter: (user_id = 12345)
-> Index Scan using likes_post_id_idx on likes l
Planning Time: 0.876 ms
Execution Time: 84125.123 ms
The two lines that made my blood run cold were:
- `Seq Scan on posts p`: Postgres was reading the entire `posts` table from disk (a Sequential Scan) and then filtering out the rows for the user I wanted. It wasn't using an index to find the user's posts directly.
- `Execution Time: 84125.123 ms`: Over 84 seconds. For a single page load.
The query planner was making a terrible decision. It chose to scan the entire posts table first, then for each of that user's posts, it would look up likes. But the initial scan was the killer. Why wasn't it using the index on `user_id`?
The Two-Part Fix: Indexing and Query Logic
The problem was twofold. First, our indexing strategy was incomplete. Second, the query structure could be improved to give the planner better hints.
Part 1: The Missing Composite Index
We had an index on `posts(user_id)`. So why wasn't it being used effectively? The problem was the `ORDER BY p.created_at DESC`. The database had to find all posts for the user, then sort them all by date, and only then could it take the top 20. The sorting step was incredibly expensive without an appropriate index.
The solution was a composite index. This is an index that covers multiple columns. Since our query filters by `user_id` and then sorts by `created_at`, the perfect index would cover both columns, in that order.
I created the following index:
CREATE INDEX idx_posts_on_user_id_and_created_at_desc ON posts (user_id, created_at DESC);
This index does two things beautifully:
- `user_id`: It allows Postgres to instantly find the block of posts for the target user.
- `created_at DESC`: Crucially, the data within that block is already pre-sorted in the exact order the query needs. Postgres can just read the first 20 entries from the index and stop. No massive sorting operation required.
Part 2: A Smarter Join Strategy
While the index was the main fix, I also refactored the query to be more explicit. By finding the 20 posts first and then joining the likes, we drastically reduce the amount of work the `GROUP BY` and `JOIN` have to do. Using a Common Table Expression (CTE) makes this clean and readable.
The new, improved query:
WITH recent_posts AS (
SELECT id, content, created_at
FROM posts
WHERE user_id = 12345
ORDER BY created_at DESC
LIMIT 20
)
SELECT
p.id,
p.content,
p.created_at,
COUNT(l.id) as like_count
FROM
recent_posts p
LEFT JOIN
likes l ON p.id = l.post_id
GROUP BY
p.id, p.content, p.created_at
ORDER BY
p.created_at DESC;
This version tells Postgres: "First, find me the 20 posts I care about using the new, super-efficient index. Then, and only then, go and get the like counts for just those 20 posts."
The Blazing Fast Result: From 84 Seconds to 2 Milliseconds
With the new index in place and the refactored query, I ran `EXPLAIN ANALYZE` again. The result was breathtaking.
Limit (cost=25.80..25.85) (actual time=1.980..1.995 rows=20)
-> ...
-> HashAggregate (cost=24.50..24.70) (actual time=1.950..1.975 rows=20)
Group Key: p.id, p.content, p.created_at
-> Left Join
-> Index Only Scan using idx_posts_on_user_id_and_created_at_desc on posts (cost=0.42..8.44) (actual time=0.045..0.060 rows=20)
Index Cond: (user_id = 12345)
-> ...
Planning Time: 0.543 ms
Execution Time: 2.015 ms
Look at those numbers! The `Seq Scan` was replaced by an `Index Only Scan`—the fastest possible way to retrieve this data. And the execution time? 2 milliseconds.
Let's do the math: 84,125 ms / 2 ms = 42,062.5. The query was now over 42,000 times faster. The production alerts silenced, and the activity feed loaded instantly.
Metric | Before (Slow Query) | After (Fast Query) |
---|---|---|
Execution Time | ~84,000 ms | ~2 ms |
Improvement | - | ~42,000x Faster |
Primary Scan on `posts` | `Seq Scan` (Full table read) | `Index Only Scan` (Highly efficient) |
Index Used | Inefficient use of primary key | `idx_posts_on_user_id_and_created_at_desc` |
Sorting Strategy | Expensive sort after fetching all data | No sort needed (read from pre-sorted index) |
Production Impact | Application timeouts, high CPU | Negligible, instant page loads |
Key Lessons from My 42,000x Performance Win
This painful experience taught me several invaluable lessons that I now apply religiously:
- `EXPLAIN ANALYZE` is your best friend. Don't guess, know. It's the single most important tool for understanding and debugging Postgres query performance.
- Your local dataset lies. Performance on a small scale means nothing. Always test queries against a production-sized (or staging) database.
- Index for your access patterns. Don't just index foreign keys. Index for your common `WHERE` clauses, `JOIN` conditions, and especially your `ORDER BY` clauses.
- Embrace composite indexes. A multi-column index can be exponentially more powerful than several single-column indexes if your queries filter and sort on those columns together.
- The order of columns in a composite index matters. Place columns with equality checks (like `user_id = ?`) before columns with range or sorting checks (like `created_at`).
- Reduce the working data set early. Use CTEs or subqueries to filter down to the small number of rows you need before performing expensive joins or aggregations.
Conclusion: From Performance Nightmare to Optimization Pro
Bringing a production database to its knees is a humbling experience for any developer. But it's also one of the fastest ways to learn. That 42,000x slowdown wasn't just a bug; it was a masterclass in how databases actually work. By understanding the tools like `EXPLAIN ANALYZE` and the power of proper indexing, you can move from writing queries that simply work to writing queries that perform at scale. Don't wait for the pager to go off—start analyzing your critical queries today.