Обсуждение: Performance issue after creating partitions

Поиск
Список
Период
Сортировка

Performance issue after creating partitions

От
Teja Jakkidi
Дата:
Hello Admin team,

We have a Postgres 13 server set up on Google Cloud. Recently we created partitions as below for the tables:
Range on ‘order_date’  column
         List on ‘country_code’ column

Quarterly partitions for range.

Order date column is a time stamp column and when we are using this column as a filter in where condition :
trunc(order_date)= to_date(’2022-04-01’,’YYYY-MM-DD’), the query scans all the partitions dated back from the year
2000.So, instead of going to the 2022 Q2 partition directly, the query is scanning all the partitions. 
But when we remove the trunc() function and just specify order_date =‘2022-04-01 07:02:30’, this works as expected. It
goesto the correct partition directly and gets the data.  
Can Anyone help me on what’s happening in the first case and why all the partitions are being scanned?
For the first case, when there is a non partition table, it is behaving way better.

Thanks,
J. Teja.


Re: Performance issue after creating partitions

От
Doug Reynolds
Дата:
You need to change your query to not use the trunc.  The TRUNC statement forces it to trunc every row in the table.

If you do order_date >= to_date(..) and order_date < to_date(..) + interval '1' day, it will use the partition range to
narrowdown to the correct partition first. 

Sent from my iPhone

> On Aug 24, 2022, at 6:41 PM, Teja Jakkidi <teja.jakkidi05@gmail.com> wrote:
>
> Hello Admin team,
>
> We have a Postgres 13 server set up on Google Cloud. Recently we created partitions as below for the tables:
> Range on ‘order_date’  column
>         List on ‘country_code’ column
>
> Quarterly partitions for range.
>
> Order date column is a time stamp column and when we are using this column as a filter in where condition :
trunc(order_date)= to_date(’2022-04-01’,’YYYY-MM-DD’), the query scans all the partitions dated back from the year
2000.So, instead of going to the 2022 Q2 partition directly, the query is scanning all the partitions. 
> But when we remove the trunc() function and just specify order_date =‘2022-04-01 07:02:30’, this works as expected.
Itgoes to the correct partition directly and gets the data.  
> Can Anyone help me on what’s happening in the first case and why all the partitions are being scanned?
> For the first case, when there is a non partition table, it is behaving way better.
>
> Thanks,
> J. Teja.
>




Re: Performance issue after creating partitions

От
Mladen Gogala
Дата:
On 8/24/22 18:41, Teja Jakkidi wrote:
trunc(order_date) = to_date(’2022-04-01’

You cannot have expression on the left side if you want partition pruning. This is elementary mistake.

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

Re: Performance issue after creating partitions

От
Teja Jakkidi
Дата:
Thank you, Doug and Malden for your inputs.

As suggested by Doug I tried replacing the trunc() with below command:
“ order_date >= to_date(..) and order_date < to_date(..) + interval '1' day”

What happens here is: My query has 2tables : A and B, both are having an inner join on company code and order date. We
haveA.Company_code, A.order_date filter in the where condition which was having trun() previously but now I replaced it
withthe command suggested by Doug. The company_code filter is working for both tables A and B, the query is searching
onlythe partitions particular to that country code. However for order_date condition, the partitions for A table are
beingutilized correctly I.e the query is searching directly in only one partition based on that date. But for table B,
itis still searching all the partitions, it is not applying the where A.order_date condition for B tables partitions
whilesearching. 
I do not understand why the A.order_date filter is not being applied to table B where as A.company_code is being
appliedto both tables. 

Thanks and Regards,
J. Teja Sri.

> On Aug 24, 2022, at 5:52 PM, Doug Reynolds <mav@wastegate.net> wrote:
>
> You need to change your query to not use the trunc.  The TRUNC statement forces it to trunc every row in the table.

>
> If you do order_date >= to_date(..) and order_date < to_date(..) + interval '1' day, it will use the partition range
tonarrow down to the correct partition first. 
>
> Sent from my iPhone
>
>> On Aug 24, 2022, at 6:41 PM, Teja Jakkidi <teja.jakkidi05@gmail.com> wrote:
>>
>> Hello Admin team,
>>
>> We have a Postgres 13 server set up on Google Cloud. Recently we created partitions as below for the tables:
>> Range on ‘order_date’  column
>>        List on ‘country_code’ column
>>
>> Quarterly partitions for range.
>>
>> Order date column is a time stamp column and when we are using this column as a filter in where condition :
trunc(order_date)= to_date(’2022-04-01’,’YYYY-MM-DD’), the query scans all the partitions dated back from the year
2000.So, instead of going to the 2022 Q2 partition directly, the query is scanning all the partitions. 
>> But when we remove the trunc() function and just specify order_date =‘2022-04-01 07:02:30’, this works as expected.
Itgoes to the correct partition directly and gets the data.  
>> Can Anyone help me on what’s happening in the first case and why all the partitions are being scanned?
>> For the first case, when there is a non partition table, it is behaving way better.
>>
>> Thanks,
>> J. Teja.
>>
>



Re: Performance issue after creating partitions

От
Teja Jakkidi
Дата:
Anyone ever encountered the below situation as what I am noticing with partitions?

Thanks,
Teja. J.

> On Aug 26, 2022, at 11:46 AM, Teja Jakkidi <teja.jakkidi05@gmail.com> wrote:
>
> Thank you, Doug and Malden for your inputs.
>
> As suggested by Doug I tried replacing the trunc() with below command:
> “ order_date >= to_date(..) and order_date < to_date(..) + interval '1' day”
>
> What happens here is: My query has 2tables : A and B, both are having an inner join on company code and order date.
Wehave A.Company_code, A.order_date filter in the where condition which was having trun() previously but now I replaced
itwith the command suggested by Doug. The company_code filter is working for both tables A and B, the query is
searchingonly the partitions particular to that country code. However for order_date condition, the partitions for A
tableare being utilized correctly I.e the query is searching directly in only one partition based on that date. But for
tableB, it is still searching all the partitions, it is not applying the where A.order_date condition for B tables
partitionswhile searching. 
> I do not understand why the A.order_date filter is not being applied to table B where as A.company_code is being
appliedto both tables. 
>
> Thanks and Regards,
> J. Teja Sri.
>
>> On Aug 24, 2022, at 5:52 PM, Doug Reynolds <mav@wastegate.net> wrote:
>>
>> You need to change your query to not use the trunc.  The TRUNC statement forces it to trunc every row in the table.
 
>>
>> If you do order_date >= to_date(..) and order_date < to_date(..) + interval '1' day, it will use the partition range
tonarrow down to the correct partition first. 
>>
>> Sent from my iPhone
>>
>>>> On Aug 24, 2022, at 6:41 PM, Teja Jakkidi <teja.jakkidi05@gmail.com> wrote:
>>>
>>> Hello Admin team,
>>>
>>> We have a Postgres 13 server set up on Google Cloud. Recently we created partitions as below for the tables:
>>> Range on ‘order_date’  column
>>>       List on ‘country_code’ column
>>>
>>> Quarterly partitions for range.
>>>
>>> Order date column is a time stamp column and when we are using this column as a filter in where condition :
trunc(order_date)= to_date(’2022-04-01’,’YYYY-MM-DD’), the query scans all the partitions dated back from the year
2000.So, instead of going to the 2022 Q2 partition directly, the query is scanning all the partitions. 
>>> But when we remove the trunc() function and just specify order_date =‘2022-04-01 07:02:30’, this works as expected.
Itgoes to the correct partition directly and gets the data.  
>>> Can Anyone help me on what’s happening in the first case and why all the partitions are being scanned?
>>> For the first case, when there is a non partition table, it is behaving way better.
>>>
>>> Thanks,
>>> J. Teja.
>>>
>>



Re: Performance issue after creating partitions

От
Tom Lane
Дата:
Teja Jakkidi <teja.jakkidi05@gmail.com> writes:
> Anyone ever encountered the below situation as what I am noticing with partitions?

You haven't shown us your query, so any answer would be blind speculation.

However, in the spirit of blind speculation, I'm wondering if you're
expecting those range constraints to propagate across a join.  They
won't; you'd need to duplicate the conditions for the other table.

That is, if you have WHERE+JOIN/ON conditions amounting to

    WHERE a.a = b.b AND b.b = constant

the planner is able to derive "a.a = constant" on the assumption of
transitivity, and use that to constrain the scan of table a (ie,
use an index, reject partitions at plan time, etc).  But no such
deduction happens for

    WHERE a.a = b.b AND b.b >= constant

If you want a constrained scan of a, you need to write it out:

    WHERE a.a = b.b AND b.b >= constant AND a.a >= constant

            regards, tom lane



Re: Performance issue after creating partitions

От
Olivier Gautherot
Дата:
In this type of situation, I would have a column of type date for the partitioning and another one for the timestamp. The additional space will surely be marginal but the gain in performance really significant.

Another thing you could try is to add a computed index on trunc(order_date): an index scan would move less data than a table scan.

Olivier Gautherot


Le jeu. 25 août 2022, 00:41, Teja Jakkidi <teja.jakkidi05@gmail.com> a écrit :
Hello Admin team,

We have a Postgres 13 server set up on Google Cloud. Recently we created partitions as below for the tables:
Range on ‘order_date’  column
         List on ‘country_code’ column

Quarterly partitions for range.

Order date column is a time stamp column and when we are using this column as a filter in where condition : trunc(order_date) = to_date(’2022-04-01’,’YYYY-MM-DD’), the query scans all the partitions dated back from the year 2000. So, instead of going to the 2022 Q2 partition directly, the query is scanning all the partitions.
But when we remove the trunc() function and just specify order_date =‘2022-04-01 07:02:30’, this works as expected. It goes to the correct partition directly and gets the data.
Can Anyone help me on what’s happening in the first case and why all the partitions are being scanned?
For the first case, when there is a non partition table, it is behaving way better.

Thanks,
J. Teja.

Re: Performance issue after creating partitions

От
Doug Reynolds
Дата:
Tom is spot on with his suggestions.

The only thing that I'll add is that sometimes there is a mismatch with timestamp resolution, which prevents a direct
innerjoin.  In this case, you filter both partitioned tables with the greater than/less than constants, THEN complete
theinner join with truncated dates.  You will still have to do a full scan; however, you'll only be doing a full scan
ofthe selected partitions. 

Other than that, you need to provide the query for additional advice.

Doug

> On Aug 30, 2022, at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Teja Jakkidi <teja.jakkidi05@gmail.com> writes:
>> Anyone ever encountered the below situation as what I am noticing with partitions?
>
> You haven't shown us your query, so any answer would be blind speculation.
>
> However, in the spirit of blind speculation, I'm wondering if you're
> expecting those range constraints to propagate across a join.  They
> won't; you'd need to duplicate the conditions for the other table.
>
> That is, if you have WHERE+JOIN/ON conditions amounting to
>
>    WHERE a.a = b.b AND b.b = constant
>
> the planner is able to derive "a.a = constant" on the assumption of
> transitivity, and use that to constrain the scan of table a (ie,
> use an index, reject partitions at plan time, etc).  But no such
> deduction happens for
>
>    WHERE a.a = b.b AND b.b >= constant
>
> If you want a constrained scan of a, you need to write it out:
>
>    WHERE a.a = b.b AND b.b >= constant AND a.a >= constant
>
>            regards, tom lane
>
>