Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
От | Masahiko Sawada |
---|---|
Тема | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Дата | |
Msg-id | CAD21AoAe08b6smo7aq6zUUSANXpcfH2JVhU54LDyaBo59Y0DHg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 (vignesh C <vignesh21@gmail.com>) |
Ответы |
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
|
Список | pgsql-bugs |
On Sun, Jun 1, 2025 at 8:13 PM vignesh C <vignesh21@gmail.com> wrote: > > On Sat, 31 May 2025 at 10:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > Thank you for updating the patch. Here are some review comments: > > > > > > --- > > > + /* > > > + * Make sure there's no cache pollution. Unlike the PG_TRY part, > > > + * this must be done unconditionally because the processing might > > > + * fail before we reach invalidation messages. > > > + */ > > > + if (rbtxn_inval_all_cache(txn)) > > > + InvalidateSystemCaches(); > > > + else > > > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr, > > > + > > > txn->distributed_invalidations); > > > + > > > > > > If we don't need to execute the distributed inval message in an error > > > path other than detecting concurrent abort, we should describe the > > > reason. > > > > > > > The concurrent abort handling is done for streaming and prepared > > transactions, where we send the transaction changes to the client > > before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now, > > among these COMMIT/ROLLBACK PREPARED cases are handled as part of a > > prepared transaction case. For ROLLBACK, we will never perform any > > changes from the current transaction, so we don't need distributed > > invalidations to be executed. For COMMIT, if we encounter any errors > > while processing changes (this is when we reach the ERROR path, which > > is not a concurrent abort), then we will reprocess all changes and, at > > the end, execute both the current transaction and distributed > > invalidations. Now, one possibility is that if, after ERROR, the > > caller does slot_advance to skip the ERROR, then we will probably miss > > executing the distributed invalidations, leading to data loss > > afterwards. If the above theory is correct, then it is better to > > execute distributed invalidation even in non-concurrent-abort cases in > > the ERROR path. This theory seems to be correct to me. If users re-try the logical decoding with smaller logical_decoding_work_mem the process could end up streaming the transaction with the wrong relcache before sending the committed transaction? > One possible reason this scenario may not occur is that > pg_logical_slot_get_changes_guts uses a PG_CATCH block to handle > exceptions, during which it calls InvalidateSystemCaches to clear the > system cache. Because of this, I believe the scenario might not > actually happen. > @Sawada-san / others — Are there any other cases where this could still occur? I think that we cannot confine use cases to walsender and pg_logical_slot_get_changes_guts() given that we expose logical decoding API at C level. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-bugs по дате отправления: