PostgreSQL Partitioning: Querying Billions of Rows Efficiently
When tables grow to billions of rows, even indexed queries become slow. Table partitioning is the solution — but only when implemented correctly.
When Partitioning Becomes Necessary
Beyond a certain table size, even B-Tree indexes become inefficient. VACUUM takes hours, backups become problematic, and queries hit unnecessarily large amounts of data. Partitioning divides a large table into smaller, manageable parts.
Partitioning Strategies
- Range Partitioning: Ideal for time-series data. Partition by month or quarter — queries that target a time range scan only the relevant partitions.
- List Partitioning: For categorical data such as region, tenant, or status. Each partition contains rows with specific values.
- Hash Partitioning: Even distribution across a fixed number of partitions — well suited for write-heavy workloads without a natural partition key.
Best Practices
- Partition Key in Query: Queries must include the partition key in the WHERE clause for partition pruning to take effect.
- Not Too Many Partitions: Hundreds of partitions slow down the query planner. 12-36 active partitions is a good guideline.
- Automatic Partition Creation: Use pg_partman for automatic creation and management of time-series partitions.
- Indexes per Partition: Each partition has its own indexes — this keeps index maintenance fast.
Results
In an e-commerce project, we split a 2.4 billion row order table using range partitioning by month. The average query time dropped from 12 seconds to 80 milliseconds — a factor of 150x.
Questions About This Topic?
We are happy to advise you on the technologies and solutions described in this article.
Get in Touch