Re: Hybrid Hash/Nested Loop joins and caching results from subplans
От | David Rowley |
---|---|
Тема | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Дата | |
Msg-id | CAApHDvo2acQSogMCa3hB7moRntXWHO8G+WSwhyty2+c8vYRq3A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Hybrid Hash/Nested Loop joins and caching results from subplans (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Список | pgsql-hackers |
On Wed, 26 Aug 2020 at 03:52, Andres Freund <andres@anarazel.de> wrote: > > On 2020-08-25 20:48:37 +1200, David Rowley wrote: > > However, given the correct planner choice, there will never be > > a gross slowdown due to having the extra node. > > There'll be a significant reduction in increase in performance. So I did a very rough-cut change to the patch to have the caching be part of Nested Loop. It can be applied on top of the other 3 v7 patches. For the performance, the test I did results in the performance actually being reduced from having the Result Cache as a separate node. The reason for this is mostly because Nested Loop projects. Each time I fetch a MinimalTuple from the cache, the patch will deform it in order to store it in the virtual inner tuple slot for the nested loop. Having the Result Cache as a separate node can skip this step as it's result tuple slot is a TTSOpsMinimalTuple, so we can just store the cached MinimalTuple right into the slot without any deforming/copying. Here's an example of a query that's now slower: select count(*) from hundredk hk inner join lookup100 l on hk.one = l.a; In this case, the original patch does not have to deform the MinimalTuple from the cache as the count(*) does not require any Vars from it. With the rough patch that's attached, the MinimalTuple is deformed in during the transformation during ExecCopySlot(). The slowdown exists no matter which column of the hundredk table I join to (schema in [1]). Performance comparison is as follows: v7 (Result Cache as a separate node) postgres=# explain (analyze, timing off) select count(*) from hundredk hk inner join lookup l on hk.one = l.a; Execution Time: 652.582 ms v7 + attached rough patch postgres=# explain (analyze, timing off) select count(*) from hundredk hk inner join lookup l on hk.one = l.a; Execution Time: 843.566 ms I've not yet thought of any way to get rid of the needless MinimalTuple deform. I suppose the cache could just have already deformed tuples, but that requires more memory and would result in a worse cache hit ratios for workloads where the cache gets filled. I'm open to ideas to make the comparison fairer. (Renamed the patch file to .txt to stop the CFbot getting upset with me) David [1] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: