Re: Wrong results from Parallel Hash Full Join
От | Melanie Plageman |
---|---|
Тема | Re: Wrong results from Parallel Hash Full Join |
Дата | |
Msg-id | CAAKRu_Ybw_0MDNTW_jg3gndXs7F6H8MUZkbY2iMtSHeS5L97hw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Wrong results from Parallel Hash Full Join (Melanie Plageman <melanieplageman@gmail.com>) |
Ответы |
Re: Wrong results from Parallel Hash Full Join
|
Список | pgsql-hackers |
On Wed, Apr 12, 2023 at 2:59 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Wed, Apr 12, 2023 at 2:14 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2023-04-12 10:57:17 -0400, Melanie Plageman wrote: > > > HeapTupleHeaderHasMatch() checks if HEAP_TUPLE_HAS_MATCH is set. > > > > > > In htup_details.h, you will see that HEAP_TUPLE_HAS_MATCH is defined as > > > HEAP_ONLY_TUPLE > > > /* > > > * HEAP_TUPLE_HAS_MATCH is a temporary flag used during hash joins. It is > > > * only used in tuples that are in the hash table, and those don't need > > > * any visibility information, so we can overlay it on a visibility flag > > > * instead of using up a dedicated bit. > > > */ > > > #define HEAP_TUPLE_HAS_MATCH HEAP_ONLY_TUPLE /* tuple has a join match */ > > > > > > If you redefine HEAP_TUPLE_HAS_MATCH as something that isn't already > > > used, say 0x1800, the query returns correct results. > > > [...] > > > The question is, why does this only happen for a parallel full hash join? > > > > I'd guess that PHJ code is missing a HeapTupleHeaderClearMatch() somewhere, > > but the non-parallel case isn't. > > Indeed. Thanks! This diff fixes the case Richard provided. > > diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c > index a45bd3a315..54c06c5eb3 100644 > --- a/src/backend/executor/nodeHash.c > +++ b/src/backend/executor/nodeHash.c > @@ -1724,6 +1724,7 @@ retry: > /* Store the hash value in the HashJoinTuple header. */ > hashTuple->hashvalue = hashvalue; > memcpy(HJTUPLE_MINTUPLE(hashTuple), tuple, tuple->t_len); > + HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple)); > > /* Push it onto the front of the bucket's list */ > ExecParallelHashPushTuple(&hashtable->buckets.shared[bucketno], > > I will propose a patch that includes this change and a test. > > I just want to convince myself that ExecParallelHashTableInsertCurrentBatch() > covers the non-batch 0 cases and we don't need to add something to > sts_puttuple(). So, indeed, tuples in batches after batch 0 already had their match bit cleared by ExecParallelHashTableInsertCurrentBatch(). Attached patch includes the fix for ExecParallelHashTableInsert() as well as a test. I toyed with adapting one of the existing parallel full hash join tests to cover this case, however, I think Richard's repro is much more clear. Maybe it is worth throwing in a few updates to the tables in the existing queries to provide coverage for the other HeapTupleHeaderClearMatch() calls in the code, though. - Melanie
Вложения
В списке pgsql-hackers по дате отправления: