Обсуждение: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
Hi Team,
We are currently experiencing performance issues related to partition scanning on a heavily used table in our PostgreSQL v17.6 database.
The table is partitioned monthly (e.g., transactions_jan25, transactions_feb25, …, transactions_sept25). We’ve observed that PostgreSQL scans these partitions in the order they were attached (January through September).
This behavior is leading to inefficient query performance when accessing recent data (e.g., August or September), as older partitions are being scanned first — adding unnecessary overhead. Since PostgreSQL does not provide a built-in setting to prioritize scanning partitions in descending order, we’re exploring ways to address this.
Could you please advise on:
How to optimize partition scanning so that recent partitions are scanned first, and
Any recommended best practices or workarounds to improve performance in such scenarios.
Looking forward to your guidance.
Thank you
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Ashutosh Bapat
Дата:
On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote: > > Hi Team, > > We are currently experiencing performance issues related to partition scanning on a heavily used table in our PostgreSQLv17.6 database. > > The table is partitioned monthly (e.g., transactions_jan25, transactions_feb25, …, transactions_sept25). We’ve observedthat PostgreSQL scans these partitions in the order they were attached (January through September). > > This behavior is leading to inefficient query performance when accessing recent data (e.g., August or September), as olderpartitions are being scanned first — adding unnecessary overhead. Since PostgreSQL does not provide a built-in settingto prioritize scanning partitions in descending order, we’re exploring ways to address this. > > Could you please advise on: > > How to optimize partition scanning so that recent partitions are scanned first, and > > Any recommended best practices or workarounds to improve performance in such scenarios. > > Looking forward to your guidance. > > Thank you > Can you please describe how the query performance is affected because of the order in which partitions are scanned? -- Best Wishes, Ashutosh Bapat
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Andrei Lepikhov
Дата:
On 8/9/2025 11:47, Ashutosh Bapat wrote: > On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote: >> Looking forward to your guidance. >> >> Thank you >> > > Can you please describe how the query performance is affected because > of the order in which partitions are scanned? I guess they mentioned that the Postgres optimiser doesn't care about the order of Append's subplans. It is a little sad in some cases. The most critical case is when we have a limitation on the number of tuples returned. In this case, the optimiser could consider the following strategies: 1. Prefer scanning local partitions to foreign ones. 2. Pick first partitions with less startup costs and 'high probability' to obtain all necessary tuples from a minimum set of partitions. Postgres arranges clauses inside a long expression according to evaluation cost (see order_qual_clauses). So, why not do similar stuff for subplans? Also, I wonder if it would make sense to shuffle partitions a little and let backends scan partitions one-by-one in different orders just to reduce any sort of contention in case the queries don't fit the partitioning expression. -- regards, Andrei Lepikhov
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Adrien Nayrat
Дата:
On 9/8/25 12:05 PM, Andrei Lepikhov wrote: > On 8/9/2025 11:47, Ashutosh Bapat wrote: >> On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote: >>> Looking forward to your guidance. >>> >>> Thank you >>> >> >> Can you please describe how the query performance is affected because >> of the order in which partitions are scanned? > I guess they mentioned that the Postgres optimiser doesn't care about > the order of Append's subplans. It is a little sad in some cases. The > most critical case is when we have a limitation on the number of tuples > returned. In this case, the optimiser could consider the following > strategies: > 1. Prefer scanning local partitions to foreign ones. > 2. Pick first partitions with less startup costs and 'high probability' > to obtain all necessary tuples from a minimum set of partitions. > > Postgres arranges clauses inside a long expression according to > evaluation cost (see order_qual_clauses). So, why not do similar stuff > for subplans? > > Also, I wonder if it would make sense to shuffle partitions a little and > let backends scan partitions one-by-one in different orders just to > reduce any sort of contention in case the queries don't fit the > partitioning expression. > It reminds me of these threads : Make the optimiser aware of partitions ordering : https://www.postgresql.org/message-id/2401607.SfZhPQhbS4@ronan_laptop Allow ordered partition scans in more cases : https://www.postgresql.org/message-id/CAApHDvojKdBR3MR59JXmaCYbyHB6Q_5qPRU+dy93En8wm+XiDA@mail.gmail.com Ordered Partitioned Table Scans : https://www.postgresql.org/message-id/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com Regards -- Adrien NAYRAT
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Vivek Gadge
Дата:
For example, when a query runs on a partitioned table, PostgreSQL scans partitions in the order they were created or attached to the parent table. In our case (monthly partitions from January through September), this means that queries looking for recent data (e.g., September) may experience additional overhead. PostgreSQL evaluates the older partitions first, checking their constraints and in some cases probing their indexes, before reaching the later partitions that actually contain the needed data.
As a result, while the query results are correct, the execution time increases due to unnecessary work on irrelevant partitions. This performance impact is more noticeable when the target partition is at the end of the scan order and pruning cannot fully eliminate the earlier partitions.
Thanks.
On Mon, 8 Sept, 2025, 3:17 pm Ashutosh Bapat, <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>
> Hi Team,
>
> We are currently experiencing performance issues related to partition scanning on a heavily used table in our PostgreSQL v17.6 database.
>
> The table is partitioned monthly (e.g., transactions_jan25, transactions_feb25, …, transactions_sept25). We’ve observed that PostgreSQL scans these partitions in the order they were attached (January through September).
>
> This behavior is leading to inefficient query performance when accessing recent data (e.g., August or September), as older partitions are being scanned first — adding unnecessary overhead. Since PostgreSQL does not provide a built-in setting to prioritize scanning partitions in descending order, we’re exploring ways to address this.
>
> Could you please advise on:
>
> How to optimize partition scanning so that recent partitions are scanned first, and
>
> Any recommended best practices or workarounds to improve performance in such scenarios.
>
> Looking forward to your guidance.
>
> Thank you
>
Can you please describe how the query performance is affected because
of the order in which partitions are scanned?
--
Best Wishes,
Ashutosh Bapat
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Andrei Lepikhov
Дата:
On 8/9/2025 13:39, Vivek Gadge wrote: > > For example, when a query runs on a partitioned table, PostgreSQL scans > partitions in the order they were created or attached to the parent > table. In our case (monthly partitions from January through September), > this means that queries looking for recent data (e.g., September) may > experience additional overhead. PostgreSQL evaluates the older > partitions first, checking their constraints and in some cases probing > their indexes, before reaching the later partitions that actually > contain the needed data. > > As a result, while the query results are correct, the execution time > increases due to unnecessary work on irrelevant partitions. This > performance impact is more noticeable when the target partition is at > the end of the scan order and pruning cannot fully eliminate the earlier > partitions. The case looks straightforward. But just to be sure that we are on the same page, could you provide a synthetic DB example and a query representing the exact problem you are going to resolve? -- regards, Andrei Lepikhov
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Ashutosh Bapat
Дата:
On Mon, Sep 8, 2025 at 5:09 PM Vivek Gadge <vvkgadge56@gmail.com> wrote: > > > For example, when a query runs on a partitioned table, PostgreSQL scans partitions in the order they were created or attachedto the parent table. In our case (monthly partitions from January through September), this means that queries lookingfor recent data (e.g., September) may experience additional overhead. PostgreSQL evaluates the older partitions first,checking their constraints and in some cases probing their indexes, before reaching the later partitions that actuallycontain the needed data. > > As a result, while the query results are correct, the execution time increases due to unnecessary work on irrelevant partitions.This performance impact is more noticeable when the target partition is at the end of the scan order and pruningcannot fully eliminate the earlier partitions. > If you don't want data from certain partitions maybe you should add a clause that will help partition pruning. If you need data from all partitions, the order in which they are scanned doesn't matter, those will be scanned either way. If partitioning pruning isn't working for you, please report the exact query. Please provide example queries anyway. -- Best Wishes, Ashutosh Bapat
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Andrei Lepikhov
Дата:
On 8/9/2025 12:32, Adrien Nayrat wrote: > On 9/8/25 12:05 PM, Andrei Lepikhov wrote: >> On 8/9/2025 11:47, Ashutosh Bapat wrote: >>> On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote: > It reminds me of these threads : Yes, partially. Actively using foreign tables and touching cheaper partitions first is crucial in queries with limits. Also, some cases may provide an implicit limit, like a merge join. -- regards, Andrei Lepikhov
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Vivek Gadge
Дата:
Hi Ashutosh,
Thank you for your feedback regarding this matter.
To provide more context, here is the exact query I am running:
EXPLAIN ANALYZE VERBOSE
SELECT m.txn_date, d.bank_ref
FROM app.main m
JOIN app.detail d ON m.txn_id = d.main_txn_id
WHERE m.txn_id = 9999999999999999999;
Both app.main and app.detail are range partitioned by month (e.g., main_202502, detail_202502, etc.) on a date column,
Attached below is the EXPLAIN ANALYZE VERBOSE output for the query. It shows that PostgreSQL is scanning all partitions in ascending order, even though the matching record is present in the main_202509 partition.
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------
Nested Loop (cost=1.00..46.19 rows=156 width=346) (actual time=17.656..18.173 rows=1 loops=1)
Output: m.txn_date, d.bank_ref, m.processor_id, m.txn_code, m.pos_data, d.issuer_txn_id, m.trans_ref, d.processor_uid, m.sub_txn_type, d.processor_token
-> Append (cost=0.43..22.05 rows=13 width=238) (actual time=8.749..9.027 rows=1 loops=1)
-> Index Scan using detail_202502_main_txn_id_key on app.detail_202502 d_1 (cost=0.43..2.65 rows=1 width=74) (actual time=1.634..1.634 rows=0 loops=1)
Output: d_1.bank_ref, d_1.issuer_txn_id, d_1.processor_uid, d_1.processor_token, d_1.main_txn_id
Index Cond: (d_1.main_txn_id = '9999999999999999999'::bigint)
-> Index Scan using detail_202503_main_txn_id_key on app.detail_202503 d_2 (cost=0.56..2.78 rows=1 width=74) (actual time=1.158..1.158 rows=0 loops=1)
Output: d_2.bank_ref, d_2.issuer_txn_id, d_2.processor_uid, d_2.processor_token, d_2.main_txn_id
Index Cond: (d_2.main_txn_id = '9999999999999999999'::bigint)
-> Index Scan using detail_202504_main_txn_id_key on app.detail_202504 d_3 (cost=0.56..2.78 rows=1 width=74) (actual time=1.105..1.106 rows=0 loops=1)
Output: d_3.bank_ref, d_3.issuer_txn_id, d_3.processor_uid, d_3.processor_token, d_3.main_txn_id
Index Cond: (d_3.main_txn_id = '9999999999999999999'::bigint)
-> Index Scan using detail_202505_main_txn_id_key on app.detail_202505 d_4 (cost=0.56..2.78 rows=1 width=73) (actual time=1.097..1.097 rows=0 loops=1)
Output: d_4.bank_ref, d_4.issuer_txn_id, d_4.processor_uid, d_4.processor_token, d_4.main_txn_id
Index Cond: (d_4.main_txn_id = '9999999999999999999'::bigint)
-> Index Scan using detail_202506_main_txn_id_key on app.detail_202506 d_5 (cost=0.56..2.78 rows=1 width=74) (actual time=1.002..1.002 rows=0 loops=1)
Output: d_5.bank_ref, d_5.issuer_txn_id, d_5.processor_uid, d_5.processor_token, d_5.main_txn_id
Index Cond: (d_5.main_txn_id = '9999999999999999999'::bigint)
-> Index Scan using detail_202507_main_txn_id_key on app.detail_202507 d_6 (cost=0.56..2.78 rows=1 width=74) (actual time=0.913..0.913 rows=0 loops=1)
Output: d_6.bank_ref, d_6.issuer_txn_id, d_6.processor_uid, d_6.processor_token, d_6.main_txn_id
Index Cond: (d_6.main_txn_id = '9999999999999999999'::bigint)
-> Index Scan using detail_202508_main_txn_id_key on app.detail_202508 d_7 (cost=0.56..2.78 rows=1 width=73) (actual time=0.833..0.833 rows=0 loops=1)
Output: d_7.bank_ref, d_7.issuer_txn_id, d_7.processor_uid, d_7.processor_token, d_7.main_txn_id
Index Cond: (d_7.main_txn_id = '9999999999999999999'::bigint)
-> Index Scan using detail_202509_main_txn_id_key on app.detail_202509 d_8 (cost=0.43..2.65 rows=1 width=74) (actual time=1.001..1.004 rows=1 loops=1)
Output: d_8.bank_ref, d_8.issuer_txn_id, d_8.processor_uid, d_8.processor_token, d_8.main_txn_id
Index Cond: (d_8.main_txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.detail_202510 d_9 (cost=0.00..0.00 rows=1 width=500) (actual time=0.041..0.042 rows=0 loops=1)
Output: d_9.bank_ref, d_9.issuer_txn_id, d_9.processor_uid, d_9.processor_token, d_9.main_txn_id
Filter: (d_9.main_txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.detail_202511 d_10 (cost=0.00..0.00 rows=1 width=500) (actual time=0.065..0.066 rows=0 loops=1)
Output: d_10.bank_ref, d_10.issuer_txn_id, d_10.processor_uid, d_10.processor_token, d_10.main_txn_id
Filter: (d_10.main_txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.detail_202512 d_11 (cost=0.00..0.00 rows=1 width=500) (actual time=0.085..0.085 rows=0 loops=1)
Output: d_11.bank_ref, d_11.issuer_txn_id, d_11.processor_uid, d_11.processor_token, d_11.main_txn_id
Filter: (d_11.main_txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.detail_202601 d_12 (cost=0.00..0.00 rows=1 width=500) (actual time=0.039..0.039 rows=0 loops=1)
Output: d_12.bank_ref, d_12.issuer_txn_id, d_12.processor_uid, d_12.processor_token, d_12.main_txn_id
Filter: (d_12.main_txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.detail_default d_13 (cost=0.00..0.00 rows=1 width=500) (actual time=0.037..0.037 rows=0 loops=1)
Output: d_13.bank_ref, d_13.issuer_txn_id, d_13.processor_uid, d_13.processor_token, d_13.main_txn_id
Filter: (d_13.main_txn_id = '9999999999999999999'::bigint)
-> Materialize (cost=0.56..22.23 rows=12 width=125) (actual time=8.901..9.137 rows=1 loops=1)
Output: m.txn_date, m.processor_id, m.txn_code, m.pos_data, m.trans_ref, m.sub_txn_type, m.txn_id
-> Append (cost=0.56..22.17 rows=12 width=125) (actual time=8.892..9.127 rows=1 loops=1)
-> Index Scan using main_202502_pkey on app.main_202502 m_1 (cost=0.56..2.78 rows=1 width=36) (actual time=0.897..0.897 rows=0 loops=1)
Output: m_1.txn_date, m_1.processor_id, m_1.txn_code, m_1.pos_data, m_1.trans_ref, m_1.sub_txn_type, m_1.txn_id
Index Cond: (m_1.txn_id = '9999999999999999999'::bigint)
-> Index Scan using main_202503_pkey on app.main_202503 m_2 (cost=0.56..2.78 rows=1 width=37) (actual time=1.105..1.105 rows=0 loops=1)
Output: m_2.txn_date, m_2.processor_id, m_2.txn_code, m_2.pos_data, m_2.trans_ref, m_2.sub_txn_type, m_2.txn_id
Index Cond: (m_2.txn_id = '9999999999999999999'::bigint)
-> Index Scan using main_202504_pkey on app.main_202504 m_3 (cost=0.56..2.78 rows=1 width=37) (actual time=1.114..1.114 rows=0 loops=1)
Output: m_3.txn_date, m_3.processor_id, m_3.txn_code, m_3.pos_data, m_3.trans_ref, m_3.sub_txn_type, m_3.txn_id
Index Cond: (m_3.txn_id = '9999999999999999999'::bigint)
-> Index Scan using main_202505_pkey on app.main_202505 m_4 (cost=0.56..2.78 rows=1 width=37) (actual time=1.135..1.135 rows=0 loops=1)
Output: m_4.txn_date, m_4.processor_id, m_4.txn_code, m_4.pos_data, m_4.trans_ref, m_4.sub_txn_type, m_4.txn_id
Index Cond: (m_4.txn_id = '9999999999999999999'::bigint)
-> Index Scan using main_202506_pkey on app.main_202506 m_5 (cost=0.56..2.78 rows=1 width=38) (actual time=1.096..1.096 rows=0 loops=1)
Output: m_5.txn_date, m_5.processor_id, m_5.txn_code, m_5.pos_data, m_5.trans_ref, m_5.sub_txn_type, m_5.txn_id
Index Cond: (m_5.txn_id = '9999999999999999999'::bigint)
-> Index Scan using main_202507_pkey on app.main_202507 m_6 (cost=0.56..2.78 rows=1 width=38) (actual time=1.285..1.285 rows=0 loops=1)
Output: m_6.txn_date, m_6.processor_id, m_6.txn_code, m_6.pos_data, m_6.trans_ref, m_6.sub_txn_type, m_6.txn_id
Index Cond: (m_6.txn_id = '9999999999999999999'::bigint)
-> Index Scan using main_202508_pkey on app.main_202508 m_7 (cost=0.56..2.78 rows=1 width=38) (actual time=1.010..1.010 rows=0 loops=1)
Output: m_7.txn_date, m_7.processor_id, m_7.txn_code, m_7.pos_data, m_7.trans_ref, m_7.sub_txn_type, m_7.txn_id
Index Cond: (m_7.txn_id = '9999999999999999999'::bigint)
-> Index Scan using main_202509_pkey on app.main_202509 m_8 (cost=0.43..2.65 rows=1 width=37) (actual time=1.243..1.245 rows=1 loops=1)
Output: m_8.txn_date, m_8.processor_id, m_8.txn_code, m_8.pos_data, m_8.trans_ref, m_8.sub_txn_type, m_8.txn_id
Index Cond: (m_8.txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.main_202510 m_9 (cost=0.00..0.00 rows=1 width=300) (actual time=0.057..0.057 rows=0 loops=1)
Output: m_9.txn_date, m_9.processor_id, m_9.txn_code, m_9.pos_data, m_9.trans_ref, m_9.sub_txn_type, m_9.txn_id
Filter: (m_9.txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.main_202511 m_10 (cost=0.00..0.00 rows=1 width=300) (actual time=0.045..0.045 rows=0 loops=1)
Output: m_10.txn_date, m_10.processor_id, m_10.txn_code, m_10.pos_data, m_10.trans_ref, m_10.sub_txn_type, m_10.txn_id
Filter: (m_10.txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.main_202512 m_11 (cost=0.00..0.00 rows=1 width=300) (actual time=0.076..0.076 rows=0 loops=1)
Output: m_11.txn_date, m_11.processor_id, m_11.txn_code, m_11.pos_data, m_11.trans_ref, m_11.sub_txn_type, m_11.txn_id
Filter: (m_11.txn_id = '9999999999999999999'::bigint)
-> Seq Scan on app.main_default m_12 (cost=0.00..0.00 rows=1 width=300) (actual time=0.047..0.047 rows=0 loops=1)
Output: m_12.txn_date, m_12.processor_id, m_12.txn_code, m_12.pos_data, m_12.trans_ref, m_12.sub_txn_type, m_12.txn_id
Filter: (m_12.txn_id = '9999999999999999999'::bigint)
On Tue, 9 Sept, 2025, 8:42 am Ashutosh Bapat, <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Sep 8, 2025 at 5:09 PM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>
>
> For example, when a query runs on a partitioned table, PostgreSQL scans partitions in the order they were created or attached to the parent table. In our case (monthly partitions from January through September), this means that queries looking for recent data (e.g., September) may experience additional overhead. PostgreSQL evaluates the older partitions first, checking their constraints and in some cases probing their indexes, before reaching the later partitions that actually contain the needed data.
>
> As a result, while the query results are correct, the execution time increases due to unnecessary work on irrelevant partitions. This performance impact is more noticeable when the target partition is at the end of the scan order and pruning cannot fully eliminate the earlier partitions.
>
If you don't want data from certain partitions maybe you should add a
clause that will help partition pruning. If you need data from all
partitions, the order in which they are scanned doesn't matter, those
will be scanned either way.
If partitioning pruning isn't working for you, please report the exact
query. Please provide example queries anyway.
--
Best Wishes,
Ashutosh Bapat
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 09:32, Vivek Gadge <vvkgadge56@gmail.com> wrote: > EXPLAIN ANALYZE VERBOSE > SELECT m.txn_date, d.bank_ref > FROM app.main m > JOIN app.detail d ON m.txn_id = d.main_txn_id > WHERE m.txn_id = 9999999999999999999; > > > Both app.main and app.detail are range partitioned by month (e.g., main_202502, detail_202502, etc.) on a date column, > > Attached below is the EXPLAIN ANALYZE VERBOSE output for the query. It shows that PostgreSQL is scanning all partitionsin ascending order, even though the matching record is present in the main_202509 partition. I think you might be under the false impression that the executor is able to stop searching for the next row to join once it finds the first matching row. If that were the case, then yes, it might be faster to somehow scan the partition where the matching row exists first as then you could forego scanning the remainder. PostgreSQL does have some ability to do some of this with the "Unique Join" functionality, but it does not apply to this case since there's no proof that only a single row can match (this would require some sort of Unique Index, which cannot exist in your case since we don't have global indexes that exist over all partitions). If you're able to patch PostgreSQL and recompile, try the attached patch away from your production environment to check if there are any gains to scanning the partitions in another order. I didn't bother to change all the append paths that were generated, but I think I've reversed the order of at least the one the planner is using in your case. It's probably true that there are some cases (when the row order does not matter) where scanning partitions that are more present in shared buffers first would be better as that means using buffers perhaps before you evict them to make way for the buffers of some other (normally unused) partition, but I don't believe buffer eviction comes into effect for your case since you're basically just scanning an index which doesn't contain any value for the search key. That requires very few buffers. A secondary thought here is that perhaps your partitioning strategy needs revision. What's the reason you opted to partition by date? Does that benefit some other queries better? Or is it just a case that it works well for some data retention policy? IMO, partitioning by range of the transaction ID would fit much better for this particular query. David
Вложения
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
David Rowley
Дата:
On Mon, 8 Sept 2025 at 22:05, Andrei Lepikhov <lepihov@gmail.com> wrote: > I guess they mentioned that the Postgres optimiser doesn't care about > the order of Append's subplans. It is a little sad in some cases. The > most critical case is when we have a limitation on the number of tuples > returned. In this case, the optimiser could consider the following > strategies: > 1. Prefer scanning local partitions to foreign ones. > 2. Pick first partitions with less startup costs and 'high probability' > to obtain all necessary tuples from a minimum set of partitions. > > Postgres arranges clauses inside a long expression according to > evaluation cost (see order_qual_clauses). So, why not do similar stuff > for subplans? This seems quite separate from what's being complained about here. It might be beneficial to reconsider whether we should do some sort of sorting on startup_subpaths inside add_paths_to_append_rel(). I imagine that it might make some sense to sort that list so the path with the cheapest startup cost is first, then put the remainder of the list in order of cheapest total cost per tuple. I suspect that would result in Foreign partitions being scanned last... ... However, it's not all that clear to me how often someone would have a LIMIT without an ORDER BY, as effectively there's nothing there to determine which rows your query returns, and there's no flexibility to change which subpaths are first in Append/MergeAppend paths created in generate_orderedappend_paths(). > Also, I wonder if it would make sense to shuffle partitions a little and > let backends scan partitions one-by-one in different orders just to > reduce any sort of contention in case the queries don't fit the > partitioning expression. I don't follow this part. Are you proposing we randomise subpath list order? What contention do you aim to fix? David
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Ashutosh Bapat
Дата:
On Wed, Sep 10, 2025 at 4:27 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Mon, 8 Sept 2025 at 22:05, Andrei Lepikhov <lepihov@gmail.com> wrote: > > I guess they mentioned that the Postgres optimiser doesn't care about > > the order of Append's subplans. It is a little sad in some cases. The > > most critical case is when we have a limitation on the number of tuples > > returned. In this case, the optimiser could consider the following > > strategies: > > 1. Prefer scanning local partitions to foreign ones. > > 2. Pick first partitions with less startup costs and 'high probability' > > to obtain all necessary tuples from a minimum set of partitions. > > > > Postgres arranges clauses inside a long expression according to > > evaluation cost (see order_qual_clauses). So, why not do similar stuff > > for subplans? > > This seems quite separate from what's being complained about here. It > might be beneficial to reconsider whether we should do some sort of > sorting on startup_subpaths inside add_paths_to_append_rel(). I > imagine that it might make some sense to sort that list so the path > with the cheapest startup cost is first, then put the remainder of the > list in order of cheapest total cost per tuple. I suspect that would > result in Foreign partitions being scanned last... If there's LIMIT without ORDER BY, we could order the list of subpaths by the number of rows in descending order or cost per row in ascending order. That way there are more chances of scanning fewer partitions quicker. -- Best Wishes, Ashutosh Bapat
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 16:26, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Wed, Sep 10, 2025 at 4:27 AM David Rowley <dgrowleyml@gmail.com> wrote: > > This seems quite separate from what's being complained about here. It > > might be beneficial to reconsider whether we should do some sort of > > sorting on startup_subpaths inside add_paths_to_append_rel(). I > > imagine that it might make some sense to sort that list so the path > > with the cheapest startup cost is first, then put the remainder of the > > list in order of cheapest total cost per tuple. I suspect that would > > result in Foreign partitions being scanned last... > > If there's LIMIT without ORDER BY, we could order the list of subpaths > by the number of rows in descending order or cost per row in ascending > order. That way there are more chances of scanning fewer partitions > quicker. Wouldn't that amount to favouring scanning some large foreign partition over a smaller local partition? My interpretation of Andrei's "Prefer scanning local partitions to foreign ones" statement is that was what we shouldn't be doing! David
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Andrei Lepikhov
Дата:
On 10/9/2025 00:57, David Rowley wrote: >> 1. Prefer scanning local partitions to foreign ones. >> 2. Pick first partitions with less startup costs and 'high probability' >> to obtain all necessary tuples from a minimum set of partitions. >> >> Postgres arranges clauses inside a long expression according to >> evaluation cost (see order_qual_clauses). So, why not do similar stuff >> for subplans? > > This seems quite separate from what's being complained about here. Maybe. I didn't see the reproduction script, which made it hard for me to understand the origin of the problem clearly.> > ... However, it's not all that clear to me how often someone would > have a LIMIT without an ORDER BY, as effectively there's nothing there > to determine which rows your query returns, and there's no flexibility > to change which subpaths are first in Append/MergeAppend paths created > in generate_orderedappend_paths(). Of course, it should be applied to an Append without pathkeys. However, I still recall user cases where the subtree scan is stopped, even without any limit, simply because MergeJoin has reached the end of the inner/outer subtree or in the case of semi- or anti-joins. I wonder if other cases may exist.> >> Also, I wonder if it would make sense to shuffle partitions a little and >> let backends scan partitions one-by-one in different orders just to >> reduce any sort of contention in case the queries don't fit the >> partitioning expression. > > I don't follow this part. Are you proposing we randomise subpath list > order? What contention do you aim to fix?I have no specific case for now. From the top of my mind, it may help to prevent flushing out some partitions from the shared buffers... -- regards, Andrei Lepikhov
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Ashutosh Bapat
Дата:
On Wed, Sep 10, 2025 at 11:50 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 10 Sept 2025 at 16:26, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Wed, Sep 10, 2025 at 4:27 AM David Rowley <dgrowleyml@gmail.com> wrote: > > > This seems quite separate from what's being complained about here. It > > > might be beneficial to reconsider whether we should do some sort of > > > sorting on startup_subpaths inside add_paths_to_append_rel(). I > > > imagine that it might make some sense to sort that list so the path > > > with the cheapest startup cost is first, then put the remainder of the > > > list in order of cheapest total cost per tuple. I suspect that would > > > result in Foreign partitions being scanned last... > > > > If there's LIMIT without ORDER BY, we could order the list of subpaths > > by the number of rows in descending order or cost per row in ascending > > order. That way there are more chances of scanning fewer partitions > > quicker. > > Wouldn't that amount to favouring scanning some large foreign > partition over a smaller local partition? My interpretation of > Andrei's "Prefer scanning local partitions to foreign ones" statement > is that was what we shouldn't be doing! Generally foreign scans will have a higher cost, including startup cost. So subpaths with local scans will be preferred. But in case there's a foreign subpath with a lower cost than local subpath, I think foreign scan should be preferred. -- Best Wishes, Ashutosh Bapat
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 19:41, Andrei Lepikhov <lepihov@gmail.com> wrote: > > On 10/9/2025 00:57, David Rowley wrote: > > ... However, it's not all that clear to me how often someone would > > have a LIMIT without an ORDER BY, as effectively there's nothing there > > to determine which rows your query returns, and there's no flexibility > > to change which subpaths are first in Append/MergeAppend paths created > > in generate_orderedappend_paths(). > Of course, it should be applied to an Append without pathkeys. > However, I still recall user cases where the subtree scan is stopped, > even without any limit, simply because MergeJoin has reached the end of > the inner/outer subtree or in the case of semi- or anti-joins. I wonder > if other cases may exist.> Cursors and nested loop semi joins are the only thing that come to mind for me. I don't see how there could be a Merge Join on an unordered Append path. Merge Join inputs need to be ordered (and have PathKeys). David
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 21:18, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > > If there's LIMIT without ORDER BY, we could order the list of subpaths > > > by the number of rows in descending order or cost per row in ascending > > > order. That way there are more chances of scanning fewer partitions > > > quicker. > > > > Wouldn't that amount to favouring scanning some large foreign > > partition over a smaller local partition? My interpretation of > > Andrei's "Prefer scanning local partitions to foreign ones" statement > > is that was what we shouldn't be doing! > > Generally foreign scans will have a higher cost, including startup > cost. So subpaths with local scans will be preferred. But in case > there's a foreign subpath with a lower cost than local subpath, I > think foreign scan should be preferred. I'm still stuck on why to you'd want to sort on the number of rows in descending order. What does that have to do with foreign scans? Otherwise, if the foreign scan comes out cheaper per row, then doing those first sounds ok to me. It's all about cost per row in my view. Foreign or local scan has no relevance to what I was proposing. David
Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
От
Ashutosh Bapat
Дата:
On Mon, Sep 15, 2025 at 3:19 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 10 Sept 2025 at 21:18, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > If there's LIMIT without ORDER BY, we could order the list of subpaths > > > > by the number of rows in descending order or cost per row in ascending > > > > order. That way there are more chances of scanning fewer partitions > > > > quicker. > > > > > > Wouldn't that amount to favouring scanning some large foreign > > > partition over a smaller local partition? My interpretation of > > > Andrei's "Prefer scanning local partitions to foreign ones" statement > > > is that was what we shouldn't be doing! > > > > Generally foreign scans will have a higher cost, including startup > > cost. So subpaths with local scans will be preferred. But in case > > there's a foreign subpath with a lower cost than local subpath, I > > think foreign scan should be preferred. > > I'm still stuck on why to you'd want to sort on the number of rows in > descending order. What does that have to do with foreign scans? > > Otherwise, if the foreign scan comes out cheaper per row, then doing > those first sounds ok to me. It's all about cost per row in my view. > Foreign or local scan has no relevance to what I was proposing. I agree that cost per row is a better parameter to order the subpaths. I am fine if we go ahead with that. Descending ordering by number rows definitely assures there are fewer subplans to execute, so saving any startup and initial work on rest of the subpaths, if there are many. If we order by cost per row, we may end up starting and cleaning many cheaper subpaths which may take longer in practice compared to starting and cleaning up fewer subpaths. -- Best Wishes, Ashutosh Bapat