Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value
От | Bharath Rupireddy |
---|---|
Тема | Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value |
Дата | |
Msg-id | CALj2ACUtyW94TF76WEM-2JvMMD1a1PzLuaW5Qd9rrKRgnMAZnw@mail.gmail.com обсуждение исходный текст |
Ответ на | Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value (sirisha chamarthi <sirichamarthi22@gmail.com>) |
Ответы |
Re: Fix GetWALAvailability function code comments for WALAVAIL_REMOVED return value
|
Список | pgsql-hackers |
On Wed, Oct 19, 2022 at 12:39 PM sirisha chamarthi <sirichamarthi22@gmail.com> wrote: > > Hi Hackers, > > The current code comment says that the replication stream on a slot with the given targetLSN can't continue after a restartbut even without a restart the stream cannot continue. The slot is invalidated and the walsender process is terminatedby the checkpoint process. Attaching a small patch to fix the comment. > > 2022-10-19 06:26:22.387 UTC [144482] STATEMENT: START_REPLICATION SLOT "s2" LOGICAL 0/0 > 2022-10-19 06:27:41.998 UTC [2553755] LOG: checkpoint starting: time > 2022-10-19 06:28:04.974 UTC [2553755] LOG: terminating process 144482 to release replication slot "s2" > 2022-10-19 06:28:04.974 UTC [144482] FATAL: terminating connection due to administrator command > 2022-10-19 06:28:04.974 UTC [144482] CONTEXT: slot "s2", output plugin "test_decoding", in the change callback, associatedLSN 0/1E23AB68 > 2022-10-19 06:28:04.974 UTC [144482] STATEMENT: START_REPLICATION SLOT "s2" LOGICAL 0/0 I think the walsender/replication stream can still continue even before the checkpointer signals it to terminate, there's an illuminating comment (see [1]) specifying when it can happen. It means that the GetWALAvailability() can return WALAVAIL_REMOVED but the checkpointer hasn't yet signalled/in the process of signalling the walsender to terminate. * * WALAVAIL_REMOVED means it has been removed. A replication stream on * a slot with this LSN cannot continue after a restart. The above existing comment, says that the slot isn't usable if "someone" (either checkpoitner or walsender or entire server itself) got restarted. It looks fine, no? [1] case WALAVAIL_REMOVED: /* * If we read the restart_lsn long enough ago, maybe that file * has been removed by now. However, the walsender could have * moved forward enough that it jumped to another file after * we looked. If checkpointer signalled the process to * termination, then it's definitely lost; but if a process is * still alive, then "unreserved" seems more appropriate. * * If we do change it, save the state for safe_wal_size below. */ if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) { -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: