Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
От | Amit Kapila |
---|---|
Тема | Re: WAL segments removed from primary despite the fact that logical replication slot needs it. |
Дата | |
Msg-id | CAA4eK1+h8KAGW1bvo3vECWR9-ZExp1EptOBvqWP8V-aDyvaORg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: WAL segments removed from primary despite the fact that logical replication slot needs it. (hubert depesz lubaczewski <depesz@depesz.com>) |
Ответы |
Re: WAL segments removed from primary despite the fact that logical replication slot needs it.
|
Список | pgsql-bugs |
On Mon, Nov 14, 2022 at 5:47 PM hubert depesz lubaczewski <depesz@depesz.com> wrote: > > On Mon, Nov 14, 2022 at 05:08:15PM +0530, Amit Kapila wrote: > > > #v+ > > > 2022-11-10 21:01:07.531 UTC,,,28925,,636378d6.70fd,4586739,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-aheadlog file ""000000010001039D00000082""",,,,,,,,,"" > ... > > > #v- > > > > > > So, the whole set of missing files was recycled at the same time. > > > > > > One more thing that I have is copy of pg_replication_slots info. This data is taken periodically, around every 1 minute.Each line, aside from pg_replication_slots slots data has also output of pg_current_wal_lsn() and timestamp. Lookslike this: > > > > > > #v+ > > > timestamp pg_current_wal_lsn slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn > > ... > > ... > > > 2022-11-10 21:02:52 UTC 103A7/B45ECEA8 focal14 pgoutput logical 16607 canvas f f \N \N 3241434855 1039D/83825958 1039D/911A8DB0 > > > 2022-11-10 21:03:52 UTC 103A7/C4355F60 focal14 pgoutput logical 16607 canvas f t 21748 \N 3241443528 1039D/83825958 1039D/955633D0 > > ... > > > > I think from the above two it is clear that the slot with restart_lsn > > 1039D/83825958 is active and we seem to have recycled corresponding > > segment. > > > > You may have the LOG for "attempting to remove WAL segments older than > > log file %s", if so, please share. Also on similar lines, I think we > > Absolutely. > > There is something weird happening: > What exactly weird you are seeing in this? To me, it appears as if the system due to some reason ignores an existing slot that has restart_lsn as 1039D/83825958. > > If I'll add "recycled" lines we will get: > > #v+ > ... > 2022-11-10 21:01:07.531 UTC,,,28925,,636378d6.70fd,4586739,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"recycled write-aheadlog file ""000000010001039D00000082""",,,,,,,,,"" > 2022-11-10 21:06:07.521 UTC,,,28925,,636378d6.70fd,4594902,,2022-11-03 08:16:22 UTC,,0,DEBUG,00000,"attempting to removeWAL segments older than log file 000000000001039D0000008A",,,,,,,,,"" > Actually, the last LOG was what I was not expecting because of the existence of a corresponding slot. This is what we have to find out how it has computed the value "000000000001039D0000008A" for the file to be removed. > > need to add some LOGs before calling RemoveOldXlogFiles() to know > > where our computation to remove files goes off. Something, like the > > attached, would be helpful but note I have added this quickly on my > > working branch to show you what I have in mind. You may need small > > changes based on the branch you are working on. > > That will be complicated. > > Changing pg (recompile, and rerun) requires careful preparation with our > customers. Which takes time. > > Additionally we are using precompiled binaries from pgdg, that makes the > process slightly more involved. > I can understand but OTOH it is not easy to proceed without more information. -- With Regards, Amit Kapila.
В списке pgsql-bugs по дате отправления: