Обсуждение: Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
От
Akashkiran Shivakumar
Дата:
Hello,
I have a postgres database (major version 13) and am doing CDC by using a replication slot with pgoutput. In our data lake, we see that there are multiple updates (3 in this case) happening to the same row as part of the same transaction. This doesn't make sense if we look at them as separate updates. The expectation was that the row was updated once and a single CDC record was pushed out. I haven't completely ruled out whether multiple updates happened in that transaction but I wanted to ask the community if it is possible that a single update statement could spill over as multiple CDC update records by pgoutput / postgres ?
If yes, could you possibly point to the testcases or code where this might happen? Any blogs or suggestions are welcome. LMK if you need further information
I have a postgres database (major version 13) and am doing CDC by using a replication slot with pgoutput. In our data lake, we see that there are multiple updates (3 in this case) happening to the same row as part of the same transaction. This doesn't make sense if we look at them as separate updates. The expectation was that the row was updated once and a single CDC record was pushed out. I haven't completely ruled out whether multiple updates happened in that transaction but I wanted to ask the community if it is possible that a single update statement could spill over as multiple CDC update records by pgoutput / postgres ?
If yes, could you possibly point to the testcases or code where this might happen? Any blogs or suggestions are welcome. LMK if you need further information
P.S: Each update seen in the data lake changes at least one field in the row. The entire row with all the columns are pushed out for the 3 updates seen in the data lake. From the application perspective, it makes sense if the 3 updates were merged into a single update record
Regards,
Akashkiran
Re: Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
От
Akashkiran Shivakumar
Дата:
Can someone take a look at this ? Any help would be appreciated
Regards,
Akashkiran
On Tue, Jul 29, 2025 at 3:37 PM Akashkiran Shivakumar <akashkiran.foss@gmail.com> wrote:
Hello,
I have a postgres database (major version 13) and am doing CDC by using a replication slot with pgoutput. In our data lake, we see that there are multiple updates (3 in this case) happening to the same row as part of the same transaction. This doesn't make sense if we look at them as separate updates. The expectation was that the row was updated once and a single CDC record was pushed out. I haven't completely ruled out whether multiple updates happened in that transaction but I wanted to ask the community if it is possible that a single update statement could spill over as multiple CDC update records by pgoutput / postgres ?
If yes, could you possibly point to the testcases or code where this might happen? Any blogs or suggestions are welcome. LMK if you need further informationP.S: Each update seen in the data lake changes at least one field in the row. The entire row with all the columns are pushed out for the 3 updates seen in the data lake. From the application perspective, it makes sense if the 3 updates were merged into a single update recordRegards,Akashkiran
possible that a single update statement could spill over as multiple CDC update records by pgoutput / postgres ?
Basically, no. If I understand your questions correctly, the answer seems to point to your application making multiple updates, which will lead to multiple UPDATE records in the WAL, and thus multiple UPDATEs via pgoutput. But the basic mapping will still be one heap update = one WAL entry = one pgoutput update. (It would be good to get familiar with the pg_waldump program and examine the WAL yourself to get an idea of the exact changes that are generated for various situations.) While there are cases in which a single SQL update can generate multiple WAL entries (e.g. indexes, TOAST), each individual update is always going to generate at least one WAL record. Due to the way that Postgres works, each update gets put into WAL immediately - Postgres cannot merge updates that happen to occur in the same transaction. Nor does pgoutput combine those. So even though at the end of the transaction a single row gets changed at the logical level, there can be quite a number of changes at the WAL level. If you want to smush together pgoutput's updates into a single call, you are welcome to do so, of course - as long as it's in the same transaction. But I'd not do that, as you are now messing with the order of events as they happened on your primary. In other words, there may have been a reason it was updated col by col on the primary, and if that bothers you, the primary is the place to fix it, not stitching it together after the fact.
Cheers,
Greg