Обсуждение: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

Поиск
Список
Период
Сортировка

Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Vivek Gadge
Дата:
Hi Team,

We are currently experiencing performance issues related to partition scanning on a heavily used table in our PostgreSQL v17.6 database.

The table is partitioned monthly (e.g., transactions_jan25, transactions_feb25, …, transactions_sept25). We’ve observed that PostgreSQL scans these partitions in the order they were attached (January through September).

This behavior is leading to inefficient query performance when accessing recent data (e.g., August or September), as older partitions are being scanned first — adding unnecessary overhead. Since PostgreSQL does not provide a built-in setting to prioritize scanning partitions in descending order, we’re exploring ways to address this.

Could you please advise on:

How to optimize partition scanning so that recent partitions are scanned first, and

Any recommended best practices or workarounds to improve performance in such scenarios.

Looking forward to your guidance.

Thank you

Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Ashutosh Bapat
Дата:
On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>
> Hi Team,
>
> We are currently experiencing performance issues related to partition scanning on a heavily used table in our
PostgreSQLv17.6 database. 
>
> The table is partitioned monthly (e.g., transactions_jan25, transactions_feb25, …, transactions_sept25). We’ve
observedthat PostgreSQL scans these partitions in the order they were attached (January through September). 
>
> This behavior is leading to inefficient query performance when accessing recent data (e.g., August or September), as
olderpartitions are being scanned first — adding unnecessary overhead. Since PostgreSQL does not provide a built-in
settingto prioritize scanning partitions in descending order, we’re exploring ways to address this. 
>
> Could you please advise on:
>
> How to optimize partition scanning so that recent partitions are scanned first, and
>
> Any recommended best practices or workarounds to improve performance in such scenarios.
>
> Looking forward to your guidance.
>
> Thank you
>

Can you please describe how the query performance is affected because
of the order in which partitions are scanned?

--
Best Wishes,
Ashutosh Bapat



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Andrei Lepikhov
Дата:
On 8/9/2025 11:47, Ashutosh Bapat wrote:
> On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>> Looking forward to your guidance.
>>
>> Thank you
>>
>
> Can you please describe how the query performance is affected because
> of the order in which partitions are scanned?
I guess they mentioned that the Postgres optimiser doesn't care about
the order of Append's subplans. It is a little sad in some cases. The
most critical case is when we have a limitation on the number of tuples
returned. In this case, the optimiser could consider the following
strategies:
1. Prefer scanning local partitions to foreign ones.
2. Pick first partitions with less startup costs and 'high probability'
to obtain all necessary tuples from a minimum set of partitions.

Postgres arranges clauses inside a long expression according to
evaluation cost (see order_qual_clauses). So, why not do similar stuff
for subplans?

Also, I wonder if it would make sense to shuffle partitions a little and
let backends scan partitions one-by-one in different orders just to
reduce any sort of contention in case the queries don't fit the
partitioning expression.

--
regards, Andrei Lepikhov



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Adrien Nayrat
Дата:
On 9/8/25 12:05 PM, Andrei Lepikhov wrote:
> On 8/9/2025 11:47, Ashutosh Bapat wrote:
>> On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>>> Looking forward to your guidance.
>>>
>>> Thank you
>>>
>>
>> Can you please describe how the query performance is affected because
>> of the order in which partitions are scanned?
> I guess they mentioned that the Postgres optimiser doesn't care about 
> the order of Append's subplans. It is a little sad in some cases. The 
> most critical case is when we have a limitation on the number of tuples 
> returned. In this case, the optimiser could consider the following 
> strategies:
> 1. Prefer scanning local partitions to foreign ones.
> 2. Pick first partitions with less startup costs and 'high probability' 
> to obtain all necessary tuples from a minimum set of partitions.
> 
> Postgres arranges clauses inside a long expression according to 
> evaluation cost (see order_qual_clauses). So, why not do similar stuff 
> for subplans?
> 
> Also, I wonder if it would make sense to shuffle partitions a little and 
> let backends scan partitions one-by-one in different orders just to 
> reduce any sort of contention in case the queries don't fit the 
> partitioning expression.
> 

It reminds me of these threads :


Make the optimiser aware of partitions ordering : 
https://www.postgresql.org/message-id/2401607.SfZhPQhbS4@ronan_laptop

Allow ordered partition scans in more cases :
https://www.postgresql.org/message-id/CAApHDvojKdBR3MR59JXmaCYbyHB6Q_5qPRU+dy93En8wm+XiDA@mail.gmail.com

Ordered Partitioned Table Scans :
https://www.postgresql.org/message-id/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com

Regards

-- 
Adrien NAYRAT




Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Vivek Gadge
Дата:

For example, when a query runs on a partitioned table, PostgreSQL scans partitions in the order they were created or attached to the parent table. In our case (monthly partitions from January through September), this means that queries looking for recent data (e.g., September) may experience additional overhead. PostgreSQL evaluates the older partitions first, checking their constraints and in some cases probing their indexes, before reaching the later partitions that actually contain the needed data.

As a result, while the query results are correct, the execution time increases due to unnecessary work on irrelevant partitions. This performance impact is more noticeable when the target partition is at the end of the scan order and pruning cannot fully eliminate the earlier partitions.

Thanks. 

On Mon, 8 Sept, 2025, 3:17 pm Ashutosh Bapat, <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>
> Hi Team,
>
> We are currently experiencing performance issues related to partition scanning on a heavily used table in our PostgreSQL v17.6 database.
>
> The table is partitioned monthly (e.g., transactions_jan25, transactions_feb25, …, transactions_sept25). We’ve observed that PostgreSQL scans these partitions in the order they were attached (January through September).
>
> This behavior is leading to inefficient query performance when accessing recent data (e.g., August or September), as older partitions are being scanned first — adding unnecessary overhead. Since PostgreSQL does not provide a built-in setting to prioritize scanning partitions in descending order, we’re exploring ways to address this.
>
> Could you please advise on:
>
> How to optimize partition scanning so that recent partitions are scanned first, and
>
> Any recommended best practices or workarounds to improve performance in such scenarios.
>
> Looking forward to your guidance.
>
> Thank you
>

Can you please describe how the query performance is affected because
of the order in which partitions are scanned?

--
Best Wishes,
Ashutosh Bapat

Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Andrei Lepikhov
Дата:
On 8/9/2025 13:39, Vivek Gadge wrote:
> 
> For example, when a query runs on a partitioned table, PostgreSQL scans 
> partitions in the order they were created or attached to the parent 
> table. In our case (monthly partitions from January through September), 
> this means that queries looking for recent data (e.g., September) may 
> experience additional overhead. PostgreSQL evaluates the older 
> partitions first, checking their constraints and in some cases probing 
> their indexes, before reaching the later partitions that actually 
> contain the needed data.
> 
> As a result, while the query results are correct, the execution time 
> increases due to unnecessary work on irrelevant partitions. This 
> performance impact is more noticeable when the target partition is at 
> the end of the scan order and pruning cannot fully eliminate the earlier 
> partitions.
The case looks straightforward. But just to be sure that we are on the 
same page, could you provide a synthetic DB example and a query 
representing the exact problem you are going to resolve?

-- 
regards, Andrei Lepikhov



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Ashutosh Bapat
Дата:
On Mon, Sep 8, 2025 at 5:09 PM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>
>
> For example, when a query runs on a partitioned table, PostgreSQL scans partitions in the order they were created or
attachedto the parent table. In our case (monthly partitions from January through September), this means that queries
lookingfor recent data (e.g., September) may experience additional overhead. PostgreSQL evaluates the older partitions
first,checking their constraints and in some cases probing their indexes, before reaching the later partitions that
actuallycontain the needed data. 
>
> As a result, while the query results are correct, the execution time increases due to unnecessary work on irrelevant
partitions.This performance impact is more noticeable when the target partition is at the end of the scan order and
pruningcannot fully eliminate the earlier partitions. 
>

If you don't want data from certain partitions maybe you should add a
clause that will help partition pruning. If you need data from all
partitions, the order in which they are scanned doesn't matter, those
will be scanned either way.

If partitioning pruning isn't working for you, please report the exact
query. Please provide example queries anyway.

--
Best Wishes,
Ashutosh Bapat



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Andrei Lepikhov
Дата:
On 8/9/2025 12:32, Adrien Nayrat wrote:
> On 9/8/25 12:05 PM, Andrei Lepikhov wrote:
>> On 8/9/2025 11:47, Ashutosh Bapat wrote:
>>> On Mon, Sep 8, 2025 at 4:01 AM Vivek Gadge <vvkgadge56@gmail.com> wrote:
> It reminds me of these threads :
Yes, partially. Actively using foreign tables and touching cheaper 
partitions first is crucial in queries with limits. Also, some cases may 
provide an implicit limit, like a merge join.

-- 
regards, Andrei Lepikhov



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Vivek Gadge
Дата:
Hi Ashutosh, 

Thank you for your feedback regarding this matter. 

To provide more context, here is the exact query I am running:

EXPLAIN ANALYZE VERBOSE
SELECT m.txn_date, d.bank_ref
FROM app.main m
JOIN app.detail d ON m.txn_id = d.main_txn_id
WHERE m.txn_id = 9999999999999999999;


Both app.main and app.detail are range partitioned by month (e.g., main_202502, detail_202502, etc.) on a date column,

Attached below is the EXPLAIN ANALYZE VERBOSE output for the query. It shows that PostgreSQL is scanning all partitions in ascending order, even though the matching record is present in the main_202509 partition.


QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------
 Nested Loop  (cost=1.00..46.19 rows=156 width=346) (actual time=17.656..18.173 rows=1 loops=1)
   Output: m.txn_date, d.bank_ref, m.processor_id, m.txn_code, m.pos_data, d.issuer_txn_id, m.trans_ref, d.processor_uid, m.sub_txn_type, d.processor_token
   ->  Append  (cost=0.43..22.05 rows=13 width=238) (actual time=8.749..9.027 rows=1 loops=1)
         ->  Index Scan using detail_202502_main_txn_id_key on app.detail_202502 d_1  (cost=0.43..2.65 rows=1 width=74) (actual time=1.634..1.634 rows=0 loops=1)
               Output: d_1.bank_ref, d_1.issuer_txn_id, d_1.processor_uid, d_1.processor_token, d_1.main_txn_id
               Index Cond: (d_1.main_txn_id = '9999999999999999999'::bigint)
         ->  Index Scan using detail_202503_main_txn_id_key on app.detail_202503 d_2  (cost=0.56..2.78 rows=1 width=74) (actual time=1.158..1.158 rows=0 loops=1)
               Output: d_2.bank_ref, d_2.issuer_txn_id, d_2.processor_uid, d_2.processor_token, d_2.main_txn_id
               Index Cond: (d_2.main_txn_id = '9999999999999999999'::bigint)
         ->  Index Scan using detail_202504_main_txn_id_key on app.detail_202504 d_3  (cost=0.56..2.78 rows=1 width=74) (actual time=1.105..1.106 rows=0 loops=1)
               Output: d_3.bank_ref, d_3.issuer_txn_id, d_3.processor_uid, d_3.processor_token, d_3.main_txn_id
               Index Cond: (d_3.main_txn_id = '9999999999999999999'::bigint)
         ->  Index Scan using detail_202505_main_txn_id_key on app.detail_202505 d_4  (cost=0.56..2.78 rows=1 width=73) (actual time=1.097..1.097 rows=0 loops=1)
               Output: d_4.bank_ref, d_4.issuer_txn_id, d_4.processor_uid, d_4.processor_token, d_4.main_txn_id
               Index Cond: (d_4.main_txn_id = '9999999999999999999'::bigint)
         ->  Index Scan using detail_202506_main_txn_id_key on app.detail_202506 d_5  (cost=0.56..2.78 rows=1 width=74) (actual time=1.002..1.002 rows=0 loops=1)
               Output: d_5.bank_ref, d_5.issuer_txn_id, d_5.processor_uid, d_5.processor_token, d_5.main_txn_id
               Index Cond: (d_5.main_txn_id = '9999999999999999999'::bigint)
         ->  Index Scan using detail_202507_main_txn_id_key on app.detail_202507 d_6  (cost=0.56..2.78 rows=1 width=74) (actual time=0.913..0.913 rows=0 loops=1)
               Output: d_6.bank_ref, d_6.issuer_txn_id, d_6.processor_uid, d_6.processor_token, d_6.main_txn_id
               Index Cond: (d_6.main_txn_id = '9999999999999999999'::bigint)
         ->  Index Scan using detail_202508_main_txn_id_key on app.detail_202508 d_7  (cost=0.56..2.78 rows=1 width=73) (actual time=0.833..0.833 rows=0 loops=1)
               Output: d_7.bank_ref, d_7.issuer_txn_id, d_7.processor_uid, d_7.processor_token, d_7.main_txn_id
               Index Cond: (d_7.main_txn_id = '9999999999999999999'::bigint)
         ->  Index Scan using detail_202509_main_txn_id_key on app.detail_202509 d_8  (cost=0.43..2.65 rows=1 width=74) (actual time=1.001..1.004 rows=1 loops=1)
               Output: d_8.bank_ref, d_8.issuer_txn_id, d_8.processor_uid, d_8.processor_token, d_8.main_txn_id
               Index Cond: (d_8.main_txn_id = '9999999999999999999'::bigint)
         ->  Seq Scan on app.detail_202510 d_9  (cost=0.00..0.00 rows=1 width=500) (actual time=0.041..0.042 rows=0 loops=1)
               Output: d_9.bank_ref, d_9.issuer_txn_id, d_9.processor_uid, d_9.processor_token, d_9.main_txn_id
               Filter: (d_9.main_txn_id = '9999999999999999999'::bigint)
         ->  Seq Scan on app.detail_202511 d_10  (cost=0.00..0.00 rows=1 width=500) (actual time=0.065..0.066 rows=0 loops=1)
               Output: d_10.bank_ref, d_10.issuer_txn_id, d_10.processor_uid, d_10.processor_token, d_10.main_txn_id
               Filter: (d_10.main_txn_id = '9999999999999999999'::bigint)
         ->  Seq Scan on app.detail_202512 d_11  (cost=0.00..0.00 rows=1 width=500) (actual time=0.085..0.085 rows=0 loops=1)
               Output: d_11.bank_ref, d_11.issuer_txn_id, d_11.processor_uid, d_11.processor_token, d_11.main_txn_id
               Filter: (d_11.main_txn_id = '9999999999999999999'::bigint)
         ->  Seq Scan on app.detail_202601 d_12  (cost=0.00..0.00 rows=1 width=500) (actual time=0.039..0.039 rows=0 loops=1)
               Output: d_12.bank_ref, d_12.issuer_txn_id, d_12.processor_uid, d_12.processor_token, d_12.main_txn_id
               Filter: (d_12.main_txn_id = '9999999999999999999'::bigint)
         ->  Seq Scan on app.detail_default d_13  (cost=0.00..0.00 rows=1 width=500) (actual time=0.037..0.037 rows=0 loops=1)
               Output: d_13.bank_ref, d_13.issuer_txn_id, d_13.processor_uid, d_13.processor_token, d_13.main_txn_id
               Filter: (d_13.main_txn_id = '9999999999999999999'::bigint)
   ->  Materialize  (cost=0.56..22.23 rows=12 width=125) (actual time=8.901..9.137 rows=1 loops=1)
         Output: m.txn_date, m.processor_id, m.txn_code, m.pos_data, m.trans_ref, m.sub_txn_type, m.txn_id
         ->  Append  (cost=0.56..22.17 rows=12 width=125) (actual time=8.892..9.127 rows=1 loops=1)
               ->  Index Scan using main_202502_pkey on app.main_202502 m_1  (cost=0.56..2.78 rows=1 width=36) (actual time=0.897..0.897 rows=0 loops=1)
                     Output: m_1.txn_date, m_1.processor_id, m_1.txn_code, m_1.pos_data, m_1.trans_ref, m_1.sub_txn_type, m_1.txn_id
                     Index Cond: (m_1.txn_id = '9999999999999999999'::bigint)
               ->  Index Scan using main_202503_pkey on app.main_202503 m_2  (cost=0.56..2.78 rows=1 width=37) (actual time=1.105..1.105 rows=0 loops=1)
                     Output: m_2.txn_date, m_2.processor_id, m_2.txn_code, m_2.pos_data, m_2.trans_ref, m_2.sub_txn_type, m_2.txn_id
                     Index Cond: (m_2.txn_id = '9999999999999999999'::bigint)
               ->  Index Scan using main_202504_pkey on app.main_202504 m_3  (cost=0.56..2.78 rows=1 width=37) (actual time=1.114..1.114 rows=0 loops=1)
                     Output: m_3.txn_date, m_3.processor_id, m_3.txn_code, m_3.pos_data, m_3.trans_ref, m_3.sub_txn_type, m_3.txn_id
                     Index Cond: (m_3.txn_id = '9999999999999999999'::bigint)
               ->  Index Scan using main_202505_pkey on app.main_202505 m_4  (cost=0.56..2.78 rows=1 width=37) (actual time=1.135..1.135 rows=0 loops=1)
                     Output: m_4.txn_date, m_4.processor_id, m_4.txn_code, m_4.pos_data, m_4.trans_ref, m_4.sub_txn_type, m_4.txn_id
                     Index Cond: (m_4.txn_id = '9999999999999999999'::bigint)
               ->  Index Scan using main_202506_pkey on app.main_202506 m_5  (cost=0.56..2.78 rows=1 width=38) (actual time=1.096..1.096 rows=0 loops=1)
                     Output: m_5.txn_date, m_5.processor_id, m_5.txn_code, m_5.pos_data, m_5.trans_ref, m_5.sub_txn_type, m_5.txn_id
                     Index Cond: (m_5.txn_id = '9999999999999999999'::bigint)
               ->  Index Scan using main_202507_pkey on app.main_202507 m_6  (cost=0.56..2.78 rows=1 width=38) (actual time=1.285..1.285 rows=0 loops=1)
                     Output: m_6.txn_date, m_6.processor_id, m_6.txn_code, m_6.pos_data, m_6.trans_ref, m_6.sub_txn_type, m_6.txn_id
                     Index Cond: (m_6.txn_id = '9999999999999999999'::bigint)
               ->  Index Scan using main_202508_pkey on app.main_202508 m_7  (cost=0.56..2.78 rows=1 width=38) (actual time=1.010..1.010 rows=0 loops=1)
                     Output: m_7.txn_date, m_7.processor_id, m_7.txn_code, m_7.pos_data, m_7.trans_ref, m_7.sub_txn_type, m_7.txn_id
                     Index Cond: (m_7.txn_id = '9999999999999999999'::bigint)
               ->  Index Scan using main_202509_pkey on app.main_202509 m_8  (cost=0.43..2.65 rows=1 width=37) (actual time=1.243..1.245 rows=1 loops=1)
                     Output: m_8.txn_date, m_8.processor_id, m_8.txn_code, m_8.pos_data, m_8.trans_ref, m_8.sub_txn_type, m_8.txn_id
                     Index Cond: (m_8.txn_id = '9999999999999999999'::bigint)
               ->  Seq Scan on app.main_202510 m_9  (cost=0.00..0.00 rows=1 width=300) (actual time=0.057..0.057 rows=0 loops=1)
                     Output: m_9.txn_date, m_9.processor_id, m_9.txn_code, m_9.pos_data, m_9.trans_ref, m_9.sub_txn_type, m_9.txn_id
                     Filter: (m_9.txn_id = '9999999999999999999'::bigint)
               ->  Seq Scan on app.main_202511 m_10  (cost=0.00..0.00 rows=1 width=300) (actual time=0.045..0.045 rows=0 loops=1)
                     Output: m_10.txn_date, m_10.processor_id, m_10.txn_code, m_10.pos_data, m_10.trans_ref, m_10.sub_txn_type, m_10.txn_id
                     Filter: (m_10.txn_id = '9999999999999999999'::bigint)
               ->  Seq Scan on app.main_202512 m_11  (cost=0.00..0.00 rows=1 width=300) (actual time=0.076..0.076 rows=0 loops=1)
                     Output: m_11.txn_date, m_11.processor_id, m_11.txn_code, m_11.pos_data, m_11.trans_ref, m_11.sub_txn_type, m_11.txn_id
                     Filter: (m_11.txn_id = '9999999999999999999'::bigint)
               ->  Seq Scan on app.main_default m_12  (cost=0.00..0.00 rows=1 width=300) (actual time=0.047..0.047 rows=0 loops=1)
                     Output: m_12.txn_date, m_12.processor_id, m_12.txn_code, m_12.pos_data, m_12.trans_ref, m_12.sub_txn_type, m_12.txn_id
                     Filter: (m_12.txn_id = '9999999999999999999'::bigint)

On Tue, 9 Sept, 2025, 8:42 am Ashutosh Bapat, <ashutosh.bapat.oss@gmail.com> wrote:
On Mon, Sep 8, 2025 at 5:09 PM Vivek Gadge <vvkgadge56@gmail.com> wrote:
>
>
> For example, when a query runs on a partitioned table, PostgreSQL scans partitions in the order they were created or attached to the parent table. In our case (monthly partitions from January through September), this means that queries looking for recent data (e.g., September) may experience additional overhead. PostgreSQL evaluates the older partitions first, checking their constraints and in some cases probing their indexes, before reaching the later partitions that actually contain the needed data.
>
> As a result, while the query results are correct, the execution time increases due to unnecessary work on irrelevant partitions. This performance impact is more noticeable when the target partition is at the end of the scan order and pruning cannot fully eliminate the earlier partitions.
>

If you don't want data from certain partitions maybe you should add a
clause that will help partition pruning. If you need data from all
partitions, the order in which they are scanned doesn't matter, those
will be scanned either way.

If partitioning pruning isn't working for you, please report the exact
query. Please provide example queries anyway.

--
Best Wishes,
Ashutosh Bapat

Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 09:32, Vivek Gadge <vvkgadge56@gmail.com> wrote:
> EXPLAIN ANALYZE VERBOSE
> SELECT m.txn_date, d.bank_ref
> FROM app.main m
> JOIN app.detail d ON m.txn_id = d.main_txn_id
> WHERE m.txn_id = 9999999999999999999;
>
>
> Both app.main and app.detail are range partitioned by month (e.g., main_202502, detail_202502, etc.) on a date
column,
>
> Attached below is the EXPLAIN ANALYZE VERBOSE output for the query. It shows that PostgreSQL is scanning all
partitionsin ascending order, even though the matching record is present in the main_202509 partition.
 

I think you might be under the false impression that the executor is
able to stop searching for the next row to join once it finds the
first matching row.  If that were the case, then yes, it might be
faster to somehow scan the partition where the matching row exists
first as then you could forego scanning the remainder.   PostgreSQL
does have some ability to do some of this with the "Unique Join"
functionality, but it does not apply to this case since there's no
proof that only a single row can match (this would require some sort
of Unique Index, which cannot exist in your case since we don't have
global indexes that exist over all partitions).

If you're able to patch PostgreSQL and recompile, try the attached
patch away from your production environment to check if there are any
gains to scanning the partitions in another order. I didn't bother to
change all the append paths that were generated, but I think I've
reversed the order of at least the one the planner is using in your
case.

It's probably true that there are some cases (when the row order does
not matter) where scanning partitions that are more present in shared
buffers first would be better as that means using buffers perhaps
before you evict them to make way for the buffers of some other
(normally unused) partition, but I don't believe buffer eviction comes
into effect for your case since you're basically just scanning an
index which doesn't contain any value for the search key. That
requires very few buffers.

A secondary thought here is that perhaps your partitioning strategy
needs revision. What's the reason you opted to partition by date? Does
that benefit some other queries better? Or is it just a case that it
works well for some data retention policy? IMO, partitioning by range
of the transaction ID would fit much better for this particular query.

David

Вложения

Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
David Rowley
Дата:
On Mon, 8 Sept 2025 at 22:05, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I guess they mentioned that the Postgres optimiser doesn't care about
> the order of Append's subplans. It is a little sad in some cases. The
> most critical case is when we have a limitation on the number of tuples
> returned. In this case, the optimiser could consider the following
> strategies:
> 1. Prefer scanning local partitions to foreign ones.
> 2. Pick first partitions with less startup costs and 'high probability'
> to obtain all necessary tuples from a minimum set of partitions.
>
> Postgres arranges clauses inside a long expression according to
> evaluation cost (see order_qual_clauses). So, why not do similar stuff
> for subplans?

This seems quite separate from what's being complained about here. It
might be beneficial to reconsider whether we should do some sort of
sorting on startup_subpaths inside add_paths_to_append_rel(). I
imagine that it might make some sense to sort that list so the path
with the cheapest startup cost is first, then put the remainder of the
list in order of cheapest total cost per tuple. I suspect that would
result in Foreign partitions being scanned last...

... However, it's not all that clear to me how often someone would
have a LIMIT without an ORDER BY, as effectively there's nothing there
to determine which rows your query returns, and there's no flexibility
to change which subpaths are first in Append/MergeAppend paths created
in generate_orderedappend_paths().

> Also, I wonder if it would make sense to shuffle partitions a little and
> let backends scan partitions one-by-one in different orders just to
> reduce any sort of contention in case the queries don't fit the
> partitioning expression.

I don't follow this part. Are you proposing we randomise subpath list
order? What contention do you aim to fix?

David



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Ashutosh Bapat
Дата:
On Wed, Sep 10, 2025 at 4:27 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 8 Sept 2025 at 22:05, Andrei Lepikhov <lepihov@gmail.com> wrote:
> > I guess they mentioned that the Postgres optimiser doesn't care about
> > the order of Append's subplans. It is a little sad in some cases. The
> > most critical case is when we have a limitation on the number of tuples
> > returned. In this case, the optimiser could consider the following
> > strategies:
> > 1. Prefer scanning local partitions to foreign ones.
> > 2. Pick first partitions with less startup costs and 'high probability'
> > to obtain all necessary tuples from a minimum set of partitions.
> >
> > Postgres arranges clauses inside a long expression according to
> > evaluation cost (see order_qual_clauses). So, why not do similar stuff
> > for subplans?
>
> This seems quite separate from what's being complained about here. It
> might be beneficial to reconsider whether we should do some sort of
> sorting on startup_subpaths inside add_paths_to_append_rel(). I
> imagine that it might make some sense to sort that list so the path
> with the cheapest startup cost is first, then put the remainder of the
> list in order of cheapest total cost per tuple. I suspect that would
> result in Foreign partitions being scanned last...

If there's LIMIT without ORDER BY, we could order the list of subpaths
by the number of rows in descending order or cost per row in ascending
order. That way there are more chances of scanning fewer partitions
quicker.

--
Best Wishes,
Ashutosh Bapat



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 16:26, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Sep 10, 2025 at 4:27 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > This seems quite separate from what's being complained about here. It
> > might be beneficial to reconsider whether we should do some sort of
> > sorting on startup_subpaths inside add_paths_to_append_rel(). I
> > imagine that it might make some sense to sort that list so the path
> > with the cheapest startup cost is first, then put the remainder of the
> > list in order of cheapest total cost per tuple. I suspect that would
> > result in Foreign partitions being scanned last...
>
> If there's LIMIT without ORDER BY, we could order the list of subpaths
> by the number of rows in descending order or cost per row in ascending
> order. That way there are more chances of scanning fewer partitions
> quicker.

Wouldn't that amount to favouring scanning some large foreign
partition over a smaller local partition? My interpretation of
Andrei's "Prefer scanning local partitions to foreign ones" statement
is that was what we shouldn't be doing!

David



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Andrei Lepikhov
Дата:
On 10/9/2025 00:57, David Rowley wrote:
>> 1. Prefer scanning local partitions to foreign ones.
>> 2. Pick first partitions with less startup costs and 'high probability'
>> to obtain all necessary tuples from a minimum set of partitions.
>>
>> Postgres arranges clauses inside a long expression according to
>> evaluation cost (see order_qual_clauses). So, why not do similar stuff
>> for subplans?
> 
> This seems quite separate from what's being complained about here.
Maybe. I didn't see the reproduction script, which made it hard for me 
to understand the origin of the problem clearly.>
> ... However, it's not all that clear to me how often someone would
> have a LIMIT without an ORDER BY, as effectively there's nothing there
> to determine which rows your query returns, and there's no flexibility
> to change which subpaths are first in Append/MergeAppend paths created
> in generate_orderedappend_paths().
Of course, it should be applied to an Append without pathkeys.
However, I still recall user cases where the subtree scan is stopped, 
even without any limit, simply because MergeJoin has reached the end of 
the inner/outer subtree or in the case of semi- or anti-joins. I wonder 
if other cases may exist.>
>> Also, I wonder if it would make sense to shuffle partitions a little and
>> let backends scan partitions one-by-one in different orders just to
>> reduce any sort of contention in case the queries don't fit the
>> partitioning expression.
> 
> I don't follow this part. Are you proposing we randomise subpath list
> order? What contention do you aim to fix?I have no specific case for now. From the top of my mind, it may help to 
prevent flushing out some partitions from the shared buffers...

-- 
regards, Andrei Lepikhov



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Ashutosh Bapat
Дата:
On Wed, Sep 10, 2025 at 11:50 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 10 Sept 2025 at 16:26, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Wed, Sep 10, 2025 at 4:27 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > This seems quite separate from what's being complained about here. It
> > > might be beneficial to reconsider whether we should do some sort of
> > > sorting on startup_subpaths inside add_paths_to_append_rel(). I
> > > imagine that it might make some sense to sort that list so the path
> > > with the cheapest startup cost is first, then put the remainder of the
> > > list in order of cheapest total cost per tuple. I suspect that would
> > > result in Foreign partitions being scanned last...
> >
> > If there's LIMIT without ORDER BY, we could order the list of subpaths
> > by the number of rows in descending order or cost per row in ascending
> > order. That way there are more chances of scanning fewer partitions
> > quicker.
>
> Wouldn't that amount to favouring scanning some large foreign
> partition over a smaller local partition? My interpretation of
> Andrei's "Prefer scanning local partitions to foreign ones" statement
> is that was what we shouldn't be doing!

Generally foreign scans will have a higher cost, including startup
cost. So subpaths with local scans will be preferred. But in case
there's a foreign subpath with a lower cost than local subpath, I
think foreign scan should be preferred.

--
Best Wishes,
Ashutosh Bapat



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 19:41, Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 10/9/2025 00:57, David Rowley wrote:
> > ... However, it's not all that clear to me how often someone would
> > have a LIMIT without an ORDER BY, as effectively there's nothing there
> > to determine which rows your query returns, and there's no flexibility
> > to change which subpaths are first in Append/MergeAppend paths created
> > in generate_orderedappend_paths().

> Of course, it should be applied to an Append without pathkeys.
> However, I still recall user cases where the subtree scan is stopped,
> even without any limit, simply because MergeJoin has reached the end of
> the inner/outer subtree or in the case of semi- or anti-joins. I wonder
> if other cases may exist.>

Cursors and nested loop semi joins are the only thing that come to
mind for me.  I don't see how there could be a Merge Join on an
unordered Append path. Merge Join inputs need to be ordered (and have
PathKeys).

David



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
David Rowley
Дата:
On Wed, 10 Sept 2025 at 21:18, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> > > If there's LIMIT without ORDER BY, we could order the list of subpaths
> > > by the number of rows in descending order or cost per row in ascending
> > > order. That way there are more chances of scanning fewer partitions
> > > quicker.
> >
> > Wouldn't that amount to favouring scanning some large foreign
> > partition over a smaller local partition? My interpretation of
> > Andrei's "Prefer scanning local partitions to foreign ones" statement
> > is that was what we shouldn't be doing!
>
> Generally foreign scans will have a higher cost, including startup
> cost. So subpaths with local scans will be preferred. But in case
> there's a foreign subpath with a lower cost than local subpath, I
> think foreign scan should be preferred.

I'm still stuck on why to you'd want to sort on the number of rows in
descending order. What does that have to do with foreign scans?

Otherwise, if the foreign scan comes out cheaper per row, then doing
those first sounds ok to me. It's all about cost per row in my view.
Foreign or local scan has no relevance to what I was proposing.

David



Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6

От
Ashutosh Bapat
Дата:
On Mon, Sep 15, 2025 at 3:19 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 10 Sept 2025 at 21:18, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > > > If there's LIMIT without ORDER BY, we could order the list of subpaths
> > > > by the number of rows in descending order or cost per row in ascending
> > > > order. That way there are more chances of scanning fewer partitions
> > > > quicker.
> > >
> > > Wouldn't that amount to favouring scanning some large foreign
> > > partition over a smaller local partition? My interpretation of
> > > Andrei's "Prefer scanning local partitions to foreign ones" statement
> > > is that was what we shouldn't be doing!
> >
> > Generally foreign scans will have a higher cost, including startup
> > cost. So subpaths with local scans will be preferred. But in case
> > there's a foreign subpath with a lower cost than local subpath, I
> > think foreign scan should be preferred.
>
> I'm still stuck on why to you'd want to sort on the number of rows in
> descending order. What does that have to do with foreign scans?
>
> Otherwise, if the foreign scan comes out cheaper per row, then doing
> those first sounds ok to me. It's all about cost per row in my view.
> Foreign or local scan has no relevance to what I was proposing.

I agree that cost per row is a better parameter to order the subpaths.
I am fine if we go ahead with that.

Descending ordering by number rows definitely assures there are fewer
subplans to execute, so saving any startup and initial work on rest of
the subpaths, if there are many. If we order by cost per row, we may
end up starting and cleaning many cheaper subpaths which may take
longer in practice compared to starting and cleaning up fewer
subpaths.

--
Best Wishes,
Ashutosh Bapat