TL;DR: Black Friday on a 2M-product platform. Before adding a single server, EXPLAIN ANALYZE found three queries doing sequential scans — fixing the indexes dropped query times from 800ms to 12ms. PgBouncer multiplexed 8,000 connections down to 150 and gave us 4x throughput. Redis cached the right three things and nothing else. Here's the full picture.

Black Friday at 12,000 concurrent users on a platform you migrated from Magento six months ago is a different kind of stress test. The team that ran it before me had a sensible instinct when load spiked: add app servers. That instinct is almost always wrong, and I spent the three weeks before the event proving it.

Start with the database, not the app servers

Every scaling conversation jumps to horizontal app server scaling. The database is almost always the first bottleneck. Before touching anything else, run:

-- Find slow queries in production (pg_stat_statements must be enabled)