Обсуждение: Improving the heapgetpage function improves performance in common scenarios

Поиск
Список
Период
Сортировка

Improving the heapgetpage function improves performance in common scenarios

От
Quan Zongliang
Дата:
Hi

In the function heapgetpage. If a table is not updated very frequently. 
Many actions in tuple loops are superfluous. For all_visible pages, 
loctup does not need to be assigned, nor does the "valid" variable. 
CheckForSerializableConflictOutNeeded from 
HeapCheckForSerializableConflictOut function, it only need to inspect at 
the beginning of the cycle only once. Using vtune you can clearly see 
the result (attached heapgetpage.jpg).

So by splitting the loop logic into two parts, the vtune results show 
significant improvement (attached heapgetpage-allvis.jpg).

The test data uses TPC-H's table "orders" with a scale=20, 30 million rows.


Quan Zongliang
Вложения

Re: Improving the heapgetpage function improves performance in common scenarios

От
John Naylor
Дата:

On Thu, Aug 24, 2023 at 5:55 PM Quan Zongliang <quanzongliang@yeah.net> wrote:

> In the function heapgetpage. If a table is not updated very frequently.
> Many actions in tuple loops are superfluous. For all_visible pages,
> loctup does not need to be assigned, nor does the "valid" variable.
> CheckForSerializableConflictOutNeeded from
> HeapCheckForSerializableConflictOut function, it only need to inspect at

Thanks for submitting! A few weeks before this, there was another proposal, which specializes code for all paths, not just one. That patch also does so without duplicating the loop:

https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de

> the beginning of the cycle only once. Using vtune you can clearly see
> the result (attached heapgetpage.jpg).
>
> So by splitting the loop logic into two parts, the vtune results show
> significant improvement (attached heapgetpage-allvis.jpg).

For future reference, it's not clear at all from the screenshots what the improvement will be for the user. In the above thread, the author shares testing methodology as well as timing measurements. This is useful for reproducibilty, as well as convincing others that the change is important.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Improving the heapgetpage function improves performance in common scenarios

От
Quan Zongliang
Дата:

On 2023/9/5 16:15, John Naylor wrote:
> 
> On Thu, Aug 24, 2023 at 5:55 PM Quan Zongliang <quanzongliang@yeah.net 
> <mailto:quanzongliang@yeah.net>> wrote:
> 
>  > In the function heapgetpage. If a table is not updated very frequently.
>  > Many actions in tuple loops are superfluous. For all_visible pages,
>  > loctup does not need to be assigned, nor does the "valid" variable.
>  > CheckForSerializableConflictOutNeeded from
>  > HeapCheckForSerializableConflictOut function, it only need to inspect at
> 
> Thanks for submitting! A few weeks before this, there was another 
> proposal, which specializes code for all paths, not just one. That patch 
> also does so without duplicating the loop:
> 
> https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de
<https://www.postgresql.org/message-id/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de>
> 
Nice patch. I'm sorry I didn't notice it before.

>  > the beginning of the cycle only once. Using vtune you can clearly see
>  > the result (attached heapgetpage.jpg).
>  >
>  > So by splitting the loop logic into two parts, the vtune results show
>  > significant improvement (attached heapgetpage-allvis.jpg).
> 
> For future reference, it's not clear at all from the screenshots what 
> the improvement will be for the user. In the above thread, the author 
> shares testing methodology as well as timing measurements. This is 
> useful for reproducibilty, as well as convincing others that the change 
> is important.
> 
Here's how I test it
    EXPLAIN ANALYZE SELECT * FROM orders;
Maybe the test wasn't good enough. Although the modified optimal result 
looks good. Because it fluctuates a lot. It's hard to compare. The 
results of vtune are therefore used.

My patch is mainly to eliminate:
1, Assignment of "loctup" struct variable (in vtune you can see that 
these 4 lines have a significant overhead: 0.4 1.0 0.2 0.4).
2. Assignment of the "valid" variable.(overhead 0.6)
3. HeapCheckForSerializableConflictOut function call.(overhead 0.6)

Although these are not the same overhead from test to test. But all are 
too obvious to ignore. The screenshots are mainly to show the three 
improvements mentioned above.

I'll also try Andres Freund's test method next.

> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>




Re: Improving the heapgetpage function improves performance in common scenarios

От
John Naylor
Дата:

On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang <quanzongliang@yeah.net> wrote:

> Here's how I test it
>     EXPLAIN ANALYZE SELECT * FROM orders;

Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good for these kinds of tests.

> I'll also try Andres Freund's test method next.

Commit f691f5b80a85 from today removes another source of overhead in this function, so I suggest testing against that, if you wish to test again.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Improving the heapgetpage function improves performance in common scenarios

От
Quan Zongliang
Дата:

On 2023/9/5 18:46, John Naylor wrote:
> 
> On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang <quanzongliang@yeah.net 
> <mailto:quanzongliang@yeah.net>> wrote:
> 
>  > Here's how I test it
>  >     EXPLAIN ANALYZE SELECT * FROM orders;
> 
> Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good 
> for these kinds of tests.
> 
>  > I'll also try Andres Freund's test method next.
> 
> Commit f691f5b80a85 from today removes another source of overhead in 
> this function, so I suggest testing against that, if you wish to test again.
> 
Test with the latest code of the master branch, see the attached results.

If not optimized(--enable-debug CFLAGS='-O0'), there is a clear 
difference. When the compiler does the optimization, the performance is 
similar. I think the compiler does a good enough optimization with 
"pg_attribute_always_inline" and the last two constant parameters when 
calling heapgetpage_collect.


> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Вложения

Re: Improving the heapgetpage function improves performance in common scenarios

От
Quan Zongliang
Дата:

On 2023/9/6 15:50, Quan Zongliang wrote:
> 
> 
> On 2023/9/5 18:46, John Naylor wrote:
>>
>> On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang <quanzongliang@yeah.net 
>> <mailto:quanzongliang@yeah.net>> wrote:
>>
>>  > Here's how I test it
>>  >     EXPLAIN ANALYZE SELECT * FROM orders;
>>
>> Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not 
>> good for these kinds of tests.
>>
>>  > I'll also try Andres Freund's test method next.
>>
>> Commit f691f5b80a85 from today removes another source of overhead in 
>> this function, so I suggest testing against that, if you wish to test 
>> again.
>>
> Test with the latest code of the master branch, see the attached results.
> 
> If not optimized(--enable-debug CFLAGS='-O0'), there is a clear 
> difference. When the compiler does the optimization, the performance is 
> similar. I think the compiler does a good enough optimization with 
> "pg_attribute_always_inline" and the last two constant parameters when 
> calling heapgetpage_collect.
> 
Add a note. The first execution time of an attachment is not calculated 
in the average.

> 
>> -- 
>> John Naylor
>> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>




Re: Improving the heapgetpage function improves performance in common scenarios

От
John Naylor
Дата:

On Wed, Sep 6, 2023 at 2:50 PM Quan Zongliang <quanzongliang@yeah.net> wrote:

> If not optimized(--enable-debug CFLAGS='-O0'), there is a clear
> difference. When the compiler does the optimization, the performance is
> similar. I think the compiler does a good enough optimization with
> "pg_attribute_always_inline" and the last two constant parameters when
> calling heapgetpage_collect.

So as we might expect, more specialization (Andres' patch) has no apparent downsides in this workload. (While I'm not sure of the point of testing at -O0, I think we can conclude that less-bright compilers will show some improvement with either patch.)

If you agree, do you want to withdraw your patch from the commit fest?

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Improving the heapgetpage function improves performance in common scenarios

От
Quan Zongliang
Дата:

On 2023/9/6 17:07, John Naylor wrote:
> 
> On Wed, Sep 6, 2023 at 2:50 PM Quan Zongliang <quanzongliang@yeah.net 
> <mailto:quanzongliang@yeah.net>> wrote:
> 
>  > If not optimized(--enable-debug CFLAGS='-O0'), there is a clear
>  > difference. When the compiler does the optimization, the performance is
>  > similar. I think the compiler does a good enough optimization with
>  > "pg_attribute_always_inline" and the last two constant parameters when
>  > calling heapgetpage_collect.
> 
> So as we might expect, more specialization (Andres' patch) has no 
> apparent downsides in this workload. (While I'm not sure of the point of 
> testing at -O0, I think we can conclude that less-bright compilers will 
> show some improvement with either patch.)
> 
> If you agree, do you want to withdraw your patch from the commit fest?
> 
Ok.
> --
> John Naylor
> EDB: http://www.enterprisedb.com <http://www.enterprisedb.com>