PostgreSQL Partitioning: Querying Billions of Rows Efficiently
When tables grow to billions of rows, even indexed queries become slow. Table partitioning is the solution, but only when implemented correctly.
When Partitioning Becomes Necessary
Beyond a certain table size, even B-Tree indexes become inefficient. VACUUM takes hours, backups become problematic, and queries hit unnecessarily large amounts of data. Partitioning divides a large table into smaller, manageable parts.
Partitioning Strategies
- Range Partitioning: Ideal for time-series data. Partition by month or quarter, queries that target a time range scan only the relevant partitions.
- List Partitioning: For categorical data such as region, tenant, or status. Each partition contains rows with specific values.
- Hash Partitioning: Even distribution across a fixed number of partitions, well suited for write-heavy workloads without a natural partition key.
Best Practices
- Partition Key in Query: Queries must include the partition key in the WHERE clause for partition pruning to take effect.
- Not Too Many Partitions: Hundreds of partitions slow down the query planner. 12-36 active partitions is a good guideline.
- Automatic Partition Creation: Use pg_partman for automatic creation and management of time-series partitions.
- Indexes per Partition: Each partition has its own indexes, this keeps index maintenance fast.
Results
In an e-commerce project, we split a 2.4 billion row order table using range partitioning by month. The average query time dropped from 12 seconds to 80 milliseconds, a factor of 150x.
Questions About This Topic?
We are happy to advise you on the technologies and solutions described in this article.
Get in TouchSeit über 25 Jahren realisieren wir Engineering-Projekte für Mittelstand und Enterprise.