RE: Forget close an open relation in ReorderBufferProcessTXN()

Поиск
Список
Период
Сортировка
От osumi.takamichi@fujitsu.com
Тема RE: Forget close an open relation in ReorderBufferProcessTXN()
Дата
Msg-id OSBPR01MB48887473DB79D2B29A74CA41ED299@OSBPR01MB4888.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Forget close an open relation in ReorderBufferProcessTXN()  (Amit Langote <amitlangote09@gmail.com>)
Ответы RE: Forget close an open relation in ReorderBufferProcessTXN()  ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
Список pgsql-hackers
On Friday, May 21, 2021 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, May 21, 2021 at 3:55 PM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> > But, I've detected segmentation faults caused by the patch, which can
> > happen during 100_bugs.pl in src/test/subscription.
> > This happens more than one in ten times.
> >
> > This problem would be a timing issue and has been introduced by v3
> already.
> > I used v5 for HEAD also and reproduced this failure, while OSS HEAD
> > doesn't reproduce this, even when I executed 100_bugs.pl 200 times in a
> tight loop.
> > I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from
> v3.
> >
> > My first guess of the cause is that between the timing to get an entry
> > from hash_search() in get_rel_sync_entry() and to set the map by
> > convert_tuples_by_name() in maybe_send_schema(), we had invalidation
> message, which tries to free unset descs in the entry ?
> 
> Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when first
> initializing an entry.  It's possible that without doing so, the map remains set
> to a garbage value, which causes the invalidation callback that runs into such
> partially initialized entry to segfault upon trying to deference that garbage
> pointer.
Just in case, I prepared a new PG and
did a check to make get_rel_sync_entry() print its first pointer value with elog.
Here, when I executed 100_bugs.pl, I got some garbage below.

* The change I did:
@@ -1011,6 +1011,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
                entry->pubactions.pubinsert = entry->pubactions.pubupdate =
                        entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false;
                entry->publish_as_relid = InvalidOid;
+               elog(LOG, "**> the pointer's default value : %p", entry->map);
        }

* Grep result of all logs from 100_bugs.pl
2021-05-21 09:05:56.132 UTC [29122] sub1 LOG:  **> the pointer's default value : (nil)
2021-05-21 09:06:11.556 UTC [30198] testsub1 LOG:  **> the pointer's default value : (nil)
2021-05-21 09:06:11.561 UTC [30200] pg_16389_sync_16384_6964667281140237667 LOG:  **> the pointer's default value :
0x7f7f7f7f7f7f7f7f
2021-05-21 09:06:11.567 UTC [30191] testsub2 LOG:  **> the pointer's default value : (nil)
2021-05-21 09:06:11.570 UTC [30194] pg_16387_sync_16384_6964667292923737489 LOG:  **> the pointer's default value :
0x7f7f7f7f7f7f7f7f
2021-05-21 09:06:02.513 UTC [29809] testsub LOG:  **> the pointer's default value : (nil)
2021-05-21 09:06:02.557 UTC [29809] testsub LOG:  **> the pointer's default value : (nil)

So, your solution is right, I think.

> I've tried that in the attached v6 patches.  Please check.
With this fix, I don't get the failure.
I executed 100_bugs.pl 100 times in a loop and didn't face that problem.

Again, I conducted one make check-world for each combination
* use OSS HEAD or PG13
* apply only the first patch or both two patches
Those all passed.


Best Regards,
    Takamichi Osumi


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Installation of regress.so?
Следующее
От: Guillaume Lelarge
Дата:
Сообщение: Re: "Multiple table synchronizations are processed serially" still happens