Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
От | Amit Kapila |
---|---|
Тема | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Дата | |
Msg-id | CAA4eK1LVcHD6VpH7m=8F9ru7Qdj9hsXE0gXtS93RmtTxf6L1NA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
|
Список | pgsql-bugs |
On Wed, May 21, 2025 at 11:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, May 21, 2025 at 4:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > However, after queuing the > > change, we don't need to copy it along with the original transaction's > > invalidations. This is because the copy is only required when we don't > > process any changes in cases like ReorderBufferForget(). > > It seems that we use the accumulated invalidation message also after > replaying or concurrently aborting a transaction via > ReorderBufferExecuteInvalidations(). Do we need to consider such cases > too? > Good point. After replaying the transaction, it doesn't matter because we would have already relayed the required invalidation while processing REORDER_BUFFER_CHANGE_INVALIDATION messages. However for concurrent abort case it could matter. See my analysis for the same below: Simulation of concurrent abort ------------------------------------------ 1) S1: CREATE TABLE d(data text not null); 2) S1: INSERT INTO d VALUES('d1'); 3) S2: BEGIN; INSERT INTO d VALUES('d2'); 4) S2: INSERT INTO unrelated_tab VALUES(1); 5) S1: ALTER PUBLICATION pb ADD TABLE d; 6) S2: INSERT INTO unrelated_tab VALUES(2); 7) S2: ROLLBACK; 8) S2: INSERT INTO d VALUES('d3'); 9) S1: INSERT INTO d VALUES('d4'); The problem with the sequence is that the insert from 3) could be decoded *after* 5) in step 6) due to streaming and that to decode the insert (which happened before the ALTER) the catalog snapshot and cache state is from *before* the ALTER TABLE. Because the transaction started in 3) doesn't actually modify any catalogs, no invalidations are executed after decoding it. Now, assume, while decoding Insert from 4), we detected a concurrent abort, then the distributed invalidation won't be executed, and if we don't have accumulated messages in txn->invalidations, then the invalidation from step 5) won't be performed. The data loss can occur in steps 8 and 9. This is just a theory, so I could be missing something. If the above turns out to be a problem, one idea for fixing it is that for the concurrent abort case (both during streaming and for prepared transaction's processing), we still check all the remaining changes and process only the changes related to invalidations. This has to be done before the current txn changes are freed via ReorderBufferResetTXN->ReorderBufferTruncateTXN. Thoughts? -- With Regards, Amit Kapila.
В списке pgsql-bugs по дате отправления: