Обсуждение: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

Поиск
Список
Период
Сортировка

Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
Ranier Vilela
Дата:
Hi,

The function heap_page_prune (src/backend/access/heap/pruneheap.c)
Has a comment:

"/*
 * presult->htsv is not initialized here because all ntuple spots in the
 * array will be set either to a valid HTSV_Result value or -1.
 */

IMO, this is a little bogus and does not make sense.

I think it would be more productive to initialize the entire array with -1, and avoid flagging, one by one, the items that should be -1.

Patch attached.

best regards,
Ranier Vilela
Вложения

Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
John Naylor
Дата:
On Thu, Dec 28, 2023 at 7:58 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
> I think it would be more productive to initialize the entire array with -1, and avoid flagging, one by one, the items
thatshould be -1. 

This just moves an operation from one place to the other, while
obliterating the explanatory comment, so I don't see an advantage.



Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
Ranier Vilela
Дата:
Em ter., 9 de jan. de 2024 às 06:31, John Naylor <johncnaylorls@gmail.com> escreveu:
On Thu, Dec 28, 2023 at 7:58 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
> I think it would be more productive to initialize the entire array with -1, and avoid flagging, one by one, the items that should be -1.

This just moves an operation from one place to the other, while
obliterating the explanatory comment, so I don't see an advantage.
Well, I think that is precisely the case for using memset.
The way initialization is currently done is much slower and harmful to the branch.
Of course, the gain should be small, but it is fully justified for switching to memset.
Regarding the comment, once initialization is done via memset, such as prstate.marked, it becomes irrelevant and unnecessary.

Best regards,
Ranier Vilela

Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
John Naylor
Дата:
On Sat, Jan 13, 2024 at 9:36 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Em ter., 9 de jan. de 2024 às 06:31, John Naylor <johncnaylorls@gmail.com> escreveu:

>> This just moves an operation from one place to the other, while
>> obliterating the explanatory comment, so I don't see an advantage.
>
> Well, I think that is precisely the case for using memset.
> The way initialization is currently done is much slower and harmful to the branch.
> Of course, the gain should be small, but it is fully justified for switching to memset.

We haven't seen any evidence or reasoning for that. Simple
rules-of-thumb are not enough.



Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
"Andrey M. Borodin"
Дата:

> On 14 Jan 2024, at 18:55, John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Sat, Jan 13, 2024 at 9:36 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
>>
>> Em ter., 9 de jan. de 2024 às 06:31, John Naylor <johncnaylorls@gmail.com> escreveu:
>
>>> This just moves an operation from one place to the other, while
>>> obliterating the explanatory comment, so I don't see an advantage.
>>
>> Well, I think that is precisely the case for using memset.
>> The way initialization is currently done is much slower and harmful to the branch.
>> Of course, the gain should be small, but it is fully justified for switching to memset.
>
> We haven't seen any evidence or reasoning for that. Simple
> rules-of-thumb are not enough.
>

Hi Ranier,

I’ll mark CF entry [0] as “Returned with Feedback”. Feel free to reopen item in this CF or submit to the next, if you
wantto continue working on this. 

I took a glance into the patch, and I would agree that setting field nonzero values with memset() is somewhat unusual.
Pleaseprovide stronger evidence to do so. 

Thanks!


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/47/4734/


Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
Ranier Vilela
Дата:
Em seg., 4 de mar. de 2024 às 03:18, Andrey M. Borodin <x4mmm@yandex-team.ru> escreveu:


> On 14 Jan 2024, at 18:55, John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Sat, Jan 13, 2024 at 9:36 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
>>
>> Em ter., 9 de jan. de 2024 às 06:31, John Naylor <johncnaylorls@gmail.com> escreveu:
>
>>> This just moves an operation from one place to the other, while
>>> obliterating the explanatory comment, so I don't see an advantage.
>>
>> Well, I think that is precisely the case for using memset.
>> The way initialization is currently done is much slower and harmful to the branch.
>> Of course, the gain should be small, but it is fully justified for switching to memset.
>
> We haven't seen any evidence or reasoning for that. Simple
> rules-of-thumb are not enough.
>

Hi Ranier,

I’ll mark CF entry [0] as “Returned with Feedback”. Feel free to reopen item in this CF or submit to the next, if you want to continue working on this.

I took a glance into the patch, and I would agree that setting field nonzero values with memset() is somewhat unusual. Please provide stronger evidence to do so.
 I counted the calls with non-zero memset in the entire postgres code and they are about 183 calls.

I counted the calls with non-zero memset in the entire postgres code and they are about 183 calls.

At least 4 calls with -1

File contrib\pg_trgm\trgm_op.c:
memset(lastpos, -1, sizeof(int) * len);

File src\backend\executor\execPartition.c:
memset(pd->indexes, -1, sizeof(int) * partdesc->nparts);

File src\backend\partitioning\partprune.c:
memset(subplan_map, -1, nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));

Does filling a memory area, one by one, with branches, need strong evidence to prove that it is better than filling a memory area, all at once, without branches?

best regards,
Ranier Vilela

Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
"Andrey M. Borodin"
Дата:

> On 4 Mar 2024, at 16:47, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Does filling a memory area, one by one, with branches, need strong evidence to prove that it is better than filling a
memoryarea, all at once, without branches?  

I apologise for being too quick to decide to mark the patch RwF. Wold you mind if restore patch as "Needs review" so we
canhave more feedback? 


Best regards, Andrey Borodin.


Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)

От
Andres Freund
Дата:
Hi,

On 2024-03-04 08:47:11 -0300, Ranier Vilela wrote:
> Does filling a memory area, one by one, with branches, need strong evidence
> to prove that it is better than filling a memory area, all at once, without
> branches?

That's a bogus comparison:

a) the memset version modifies the whole array, rather than just elements that
   correspond to an item - often there will be far fewer items than the
   maximally possible number

b) the memset version initializes array elements that will be set to an actual
   value

c) switching to memset does not elide any branches, as the branch is still
   needed

And even without those, it'd still not be obviously better to use an
ahead-of-time memset(), as storing lots of values at once is more likely to be
bound by memory bandwidth than interleaving different work.

Andres