Обсуждение: dropping datumSort field

Поиск
Список
Период
Сортировка

dropping datumSort field

От
Zhihong Yu
Дата:
Hi,
I was looking at ExecSort() w.r.t. datum sort.

I wonder if the datumSort field can be dropped.
Here is a patch illustrating the potential simplification.

Please take a look.

Thanks
Вложения

Re: dropping datumSort field

От
Robert Haas
Дата:
On Mon, Aug 8, 2022 at 5:51 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> Hi,
> I was looking at ExecSort() w.r.t. datum sort.
>
> I wonder if the datumSort field can be dropped.
> Here is a patch illustrating the potential simplification.
>
> Please take a look.

One problem with this patch is that, if I apply it, PostgreSQL does not compile:

nodeSort.c:197:6: error: use of undeclared identifier 'tupDesc'
        if (tupDesc->natts == 1)
            ^
1 error generated.

Leaving that aside, I don't really see any advantage in this sort of change.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: dropping datumSort field

От
Zhihong Yu
Дата:


On Tue, Aug 9, 2022 at 8:01 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 8, 2022 at 5:51 PM Zhihong Yu <zyu@yugabyte.com> wrote:
> Hi,
> I was looking at ExecSort() w.r.t. datum sort.
>
> I wonder if the datumSort field can be dropped.
> Here is a patch illustrating the potential simplification.
>
> Please take a look.

One problem with this patch is that, if I apply it, PostgreSQL does not compile:

nodeSort.c:197:6: error: use of undeclared identifier 'tupDesc'
        if (tupDesc->natts == 1)
            ^
1 error generated.

Leaving that aside, I don't really see any advantage in this sort of change.

--
Robert Haas
EDB: http://www.enterprisedb.com

Thanks for trying out the patch.

        TupleDesc   tupDesc;

tupDesc is declared inside `if (!node->sort_Done)` block whereas the last reference to tupDesc is outside the if block.

I take your review comment and will go back to do more homework.

Cheers

Re: dropping datumSort field

От
Robert Haas
Дата:
On Tue, Aug 9, 2022 at 11:16 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> tupDesc is declared inside `if (!node->sort_Done)` block whereas the last reference to tupDesc is outside the if
block.

Yep.

> I take your review comment and will go back to do more homework.

The real point for me here is you haven't offered any reason to make
this change. The structure member in question is basically free.
Because of alignment padding it uses no more memory, and it makes the
intent of the code clearer.

Let's not change things just because we could.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: dropping datumSort field

От
Zhihong Yu
Дата:


On Tue, Aug 9, 2022 at 8:24 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 9, 2022 at 11:16 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> tupDesc is declared inside `if (!node->sort_Done)` block whereas the last reference to tupDesc is outside the if block.

Yep.

> I take your review comment and will go back to do more homework.

The real point for me here is you haven't offered any reason to make
this change. The structure member in question is basically free.
Because of alignment padding it uses no more memory, and it makes the
intent of the code clearer.

Let's not change things just because we could.

--
Robert Haas
EDB: http://www.enterprisedb.com

I should have provided motivation for the patch.

I was aware of recent changes around ExprEvalStep. e.g.

Remove unused fields from ExprEvalStep

Though the datumSort field may be harmless for now, in the future, the (auto) alignment padding may not work if more fields are added to the struct.
I think we should always leave some room in struct's for future expansion.

As for making the intent of the code clearer, the datumSort field is only used in one method.
My previous patch removed some comment which should have been shifted into this method.
In my opinion, localizing the check in single method is easier to understand than resorting to additional struct field.

Cheers

Re: dropping datumSort field

От
Robert Haas
Дата:
On Tue, Aug 9, 2022 at 11:42 AM Zhihong Yu <zyu@yugabyte.com> wrote:
> Though the datumSort field may be harmless for now, in the future, the (auto) alignment padding may not work if more
fieldsare added to the struct.
 
> I think we should always leave some room in struct's for future expansion.

I doubt the size of this struct is particularly important, unlike
ExprEvalStep which needs to be small. But if it turns out in the
future that we need to try to squeeze this struct into fewer bytes, we
can always do something like this then. Right now there's no obvious
point to it.

Sure, it might be valuable *if* we add more fields to the struct and
*if* that means that the byte taken up by this flag actually makes the
struct bigger and *if* the size of the struct is demonstrated to be a
performance problem. But right now none of that has happened, and
maybe none of it will ever happen.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: dropping datumSort field

От
David Rowley
Дата:
On Wed, 10 Aug 2022 at 03:16, Zhihong Yu <zyu@yugabyte.com> wrote:
> On Tue, Aug 9, 2022 at 8:01 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> One problem with this patch is that, if I apply it, PostgreSQL does not compile:
>>
>> nodeSort.c:197:6: error: use of undeclared identifier 'tupDesc'
>>         if (tupDesc->natts == 1)
>>             ^
>> 1 error generated.
>>
>> Leaving that aside, I don't really see any advantage in this sort of change.

I'm with Robert on this one. If you look at the discussion for that
commit, you'll find quite a bit of benchmarking was done around this
[1].  The "if" test here is in quite a hot code path, so we want to
ensure whatever is being referenced here causes the least amount of
CPU cache misses.  Volcano executors which process a single row at a
time are not exactly an ideal workload for a modern processor due to
the bad L1i and L1d hit ratios. This becomes worse with more plan
nodes as these caches are more likely to get flushed of useful cache
lines when mode notes are visited. Various other fields in 'node' have
just been accessed in the code leading up to the "if
(node->datumSort)" check, so we're probably not going to encounter any
CPU pipeline stalls waiting for cache lines to be loaded into L1d.  It
seems you're proposing to change this and have offered no evidence of
no performance regressions from doing so.  Going by the compilation
error above, it seems unlikely that you've given performance any
consideration at all.

I mean this in the best way possible; for the future, I really
recommend arriving with ideas that are well researched and tested.
When you can, arrive with evidence to prove your change is good.  For
this case, evidence would be benchmark results.  The problem is if you
were to arrive with patches such as this too often then you'll start
to struggle to get attention from anyone, let alone a committer. You
don't want to build a reputation for bad quality work as it's likely
most committers will steer clear of your work.  If you want a good
recent example of a good proposal, have a look at Yuya Watari's
write-up at [2] and [3]. There was a clear problem statement there and
a patch that was clearly a proof of concept only, so the author was
under no illusion that what he had was ideal.  Now, some other ideas
were suggested on that thread to hopefully simplify the task and
provide even better performance. Yuya went off and did that and
arrived back armed with further benchmark results. I was quite
impressed with this considering he's not posted to -hackers very
often.  Now, if you were a committer, would you be looking at the
patches from the person who sends in half-thought-through ideas, or
patches from someone that has clearly put a great deal of effort into
first clearly stating the problem and then proving that the problem is
solved by the given patch?

David

[1] https://www.postgresql.org/message-id/CAApHDvrWV%3Dv0qKsC9_BHqhCn9TusrNvCaZDz77StCO--fmgbKA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJ2pMkZNCgoUKSE+_5LthD+KbXKvq6h2hQN8Esxpxd+cxmgomg@mail.gmail.com
[3] https://www.postgresql.org/message-id/CAJ2pMkZKFVmPHovyyueBpwb_nYYVk2+GaDqgzxZVnjkvxgtXog@mail.gmail.com



Re: dropping datumSort field

От
Zhihong Yu
Дата:


On Tue, Aug 9, 2022 at 9:04 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 10 Aug 2022 at 03:16, Zhihong Yu <zyu@yugabyte.com> wrote:
> On Tue, Aug 9, 2022 at 8:01 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> One problem with this patch is that, if I apply it, PostgreSQL does not compile:
>>
>> nodeSort.c:197:6: error: use of undeclared identifier 'tupDesc'
>>         if (tupDesc->natts == 1)
>>             ^
>> 1 error generated.
>>
>> Leaving that aside, I don't really see any advantage in this sort of change.

I'm with Robert on this one. If you look at the discussion for that
commit, you'll find quite a bit of benchmarking was done around this
[1].  The "if" test here is in quite a hot code path, so we want to
ensure whatever is being referenced here causes the least amount of
CPU cache misses.  Volcano executors which process a single row at a
time are not exactly an ideal workload for a modern processor due to
the bad L1i and L1d hit ratios. This becomes worse with more plan
nodes as these caches are more likely to get flushed of useful cache
lines when mode notes are visited. Various other fields in 'node' have
just been accessed in the code leading up to the "if
(node->datumSort)" check, so we're probably not going to encounter any
CPU pipeline stalls waiting for cache lines to be loaded into L1d.  It
seems you're proposing to change this and have offered no evidence of
no performance regressions from doing so.  Going by the compilation
error above, it seems unlikely that you've given performance any
consideration at all.

I mean this in the best way possible; for the future, I really
recommend arriving with ideas that are well researched and tested.
When you can, arrive with evidence to prove your change is good.  For
this case, evidence would be benchmark results.  The problem is if you
were to arrive with patches such as this too often then you'll start
to struggle to get attention from anyone, let alone a committer. You
don't want to build a reputation for bad quality work as it's likely
most committers will steer clear of your work.  If you want a good
recent example of a good proposal, have a look at Yuya Watari's
write-up at [2] and [3]. There was a clear problem statement there and
a patch that was clearly a proof of concept only, so the author was
under no illusion that what he had was ideal.  Now, some other ideas
were suggested on that thread to hopefully simplify the task and
provide even better performance. Yuya went off and did that and
arrived back armed with further benchmark results. I was quite
impressed with this considering he's not posted to -hackers very
often.  Now, if you were a committer, would you be looking at the
patches from the person who sends in half-thought-through ideas, or
patches from someone that has clearly put a great deal of effort into
first clearly stating the problem and then proving that the problem is
solved by the given patch?

David

[1] https://www.postgresql.org/message-id/CAApHDvrWV%3Dv0qKsC9_BHqhCn9TusrNvCaZDz77StCO--fmgbKA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJ2pMkZNCgoUKSE+_5LthD+KbXKvq6h2hQN8Esxpxd+cxmgomg@mail.gmail.com
[3] https://www.postgresql.org/message-id/CAJ2pMkZKFVmPHovyyueBpwb_nYYVk2+GaDqgzxZVnjkvxgtXog@mail.gmail.com

Hi, David:
Thanks for your detailed response.