Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
Дата
Msg-id CAA4eK1+6jaBCpybngmgUkCQ=TrgPrMZjRqTjwQqiaqdDGtp7Pg@mail.gmail.com
обсуждение исходный текст
Ответ на [BUG] FailedAssertion in SnapBuildPurgeOlderTxn  (Maxim Orlov <orlovmg@gmail.com>)
Ответы Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn  (Maxim Orlov <orlovmg@gmail.com>)
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Mon, Nov 21, 2022 at 6:17 PM Maxim Orlov <orlovmg@gmail.com> wrote:
>
> PROBLEM
>
> After some investigation, I think, the problem is in the snapbuild.c (commit 272248a0c1b1, see [0]). We do allocate
InitialRunningXacts
> array in the context of builder->context, but for the time when we call SnapBuildPurgeOlderTxn this context may be
alreadyfree'd.
 
>

I think you are seeing it freed in SnapBuildPurgeOlderTxn when we
finish and restart decoding in the same session. After finishing the
first decoding, it frees the decoding context but we forgot to reset
NInitialRunningXacts and InitialRunningXacts array. So, next time when
we start decoding in the same session where we don't restore any
serialized snapshot, it can lead to the problem you are seeing because
NInitialRunningXacts (and InitialRunningXacts array) won't have sane
values.

This can happen in the catalog_change_snapshot test as we have
multiple permutations and those use the same session across a restart
of decoding.

>
> Simple fix like:
> @@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
>                  * changes. See SnapBuildXidSetCatalogChanges.
>                  */
>                 NInitialRunningXacts = nxacts;
> -               InitialRunningXacts = MemoryContextAlloc(builder->context, sz);
> +               InitialRunningXacts = MemoryContextAlloc(TopMemoryContext, sz);
>                 memcpy(InitialRunningXacts, running->xids, sz);
>                 qsort(InitialRunningXacts, nxacts, sizeof(TransactionId), xidComparator);
>
> seems to solve the described problem, but I'm not in the context of [0] and why array is allocated in
builder->context.
>

It will leak the memory for InitialRunningXacts. We need to reset
NInitialRunningXacts (and InitialRunningXacts) as mentioned above.

Thank you for the report and initial analysis. I have added Sawada-San
to know his views as he was the primary author of this work.

-- 
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Time delayed LR (WAS Re: logical replication restrictions)
Следующее
От: Maxim Orlov
Дата:
Сообщение: Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn