Re: Postgres 16.1 - Bug: cache entry already complete

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Postgres 16.1 - Bug: cache entry already complete
Дата
Msg-id CAApHDvoBTZzooXsH4hqtQ4cNJxuYZe=hakPcLSPZP9-NkfaSjQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Postgres 16.1 - Bug: cache entry already complete  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Postgres 16.1 - Bug: cache entry already complete  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-bugs
On Wed, 3 Jan 2024 at 13:40, David Rowley <dgrowleyml@gmail.com> wrote:
> The problem is that singlerow is set entirely based on innerunique.
> That's wrong because in this case, the unique properties include the
> Join Filter, not just the parameterized qual.
>
> I think the fix should be in get_memoize_path(). What we pass to
> create_memoize_path() for the "singlerow" parameter needs to be more
> than just extra->inner_unique. The Join Filter must also be empty.

It looks like we only figure out what will become the "Join Filter" in
create_nestloop_path(), which is slightly too late as we create the
MemoizePath before creating the NestLoopPath.

In the attached, I drafted up some code that duplicates the logic in
create_nestloop_path() to determine if there's going to be a
joinrestrictinfo (Join Filter) and to set MemoizePath.singlerow to
false if there is going to be one.

My concern with that is that the Unique Join optimisation will cause
execution to skip to the next outer row and that will leave us no
means of marking the Memoize cache entry as complete.  In
singlerow==false Memoize nodes, we only mark the cache as complete
when we read the inner node to completion.   The unique join
optimisation means we often don't do that due to skipping to the next
outer row on finding the first inner match.

Basically, what I'm saying is that Memoize is going to result in many
more cache misses due to incomplete cache entries.  Maybe we should
have get_memoize_path() return NULL for this case so that we don't
Memoize when there's a Join Filter and extra->inner_unique is set to
true.

With the attached, I see:

                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual rows=315 loops=1)
   Join Filter: (t3.t2_id = t2.id)
   Rows Removed by Join Filter: 1007
   ->  Nested Loop Left Join (actual rows=315 loops=1)
         ->  Nested Loop Left Join (actual rows=315 loops=1)
               ->  Seq Scan on t4 (actual rows=315 loops=1)
               ->  Memoize (actual rows=1 loops=315)
                     Cache Key: t4.t2_id
                     Cache Mode: logical
                     SingleRow: true
                     Hits: 296  Misses: 19  Evictions: 0  Overflows: 0
 Memory Usage: 3kB
                     ->  Index Only Scan using t2_pkey on t2 (actual
rows=1 loops=19)
                           Index Cond: (id = t4.t2_id)
                           Heap Fetches: 18
         ->  Memoize (actual rows=1 loops=315)
               Cache Key: t4.t1_id
               Cache Mode: logical
               SingleRow: true
               Hits: 276  Misses: 39  Evictions: 0  Overflows: 0
Memory Usage: 5kB
               ->  Index Only Scan using t1_pkey on t1 (actual rows=1 loops=39)
                     Index Cond: (id = t4.t1_id)
                     Heap Fetches: 38
   ->  Memoize (actual rows=4 loops=315)
         Cache Key: t1.id, t1.id
         Cache Mode: logical
         SingleRow: false
         Hits: 199  Misses: 116  Evictions: 0  Overflows: 0  Memory Usage: 15kB
         ->  Index Scan using t3_t1_id_index on t3 (actual rows=3 loops=116)
               Index Cond: (t1_id = t1.id)
 Planning Time: 0.322 ms
 Execution Time: 3654.894 ms
(31 rows)

(I just added SingleRow to explain to make it easier to see what's going on)

Notice the Cache Misses is 116 for the problem Memoize node.  There
are only 77 distinct values for t1_id in t3. So we're certainly
classing cache entries as invalid due to them being complete==false.

David

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Postgres 16.1 - Bug: cache entry already complete
Следующее
От: Richard Guo
Дата:
Сообщение: Re: Postgres 16.1 - Bug: cache entry already complete