Re: Introduce XID age and inactive timeout based replication slot invalidation
От | Amit Kapila |
---|---|
Тема | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Дата | |
Msg-id | CAA4eK1LnVV2FzB4+kSY5m2yyG4sr94E19Ng6OCTFzMJQr57X0g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Список | pgsql-hackers |
On Mon, Sep 16, 2024 at 10:41 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Thanks for looking into this. > > On Mon, Sep 16, 2024 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Why raise the ERROR just for timeout invalidation here and why not if > > the slot is invalidated for other reasons? This raises the question of > > what happens before this patch if the invalid slot is used from places > > where we call ReplicationSlotAcquire(). I did a brief code analysis > > and found that for StartLogicalReplication(), even if the error won't > > occur in ReplicationSlotAcquire(), it would have been caught in > > CreateDecodingContext(). I think that is where we should also add this > > new error. Similarly, pg_logical_slot_get_changes_guts() and other > > logical replication functions should be calling > > CreateDecodingContext() which can raise the new ERROR. I am not sure > > about how the invalid slots are handled during physical replication, > > please check the behavior of that before this patch. > > When physical slots are invalidated due to wal_removed reason, the failure happens at a much later point for the streamingstandbys while reading the requested WAL files like the following: > > 2024-09-16 16:29:52.416 UTC [876059] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000005has already been removed > 2024-09-16 16:29:52.416 UTC [872418] LOG: waiting for WAL to become available at 0/5002000 > > At this point, despite the slot being invalidated, its wal_status can still come back to 'unreserved' even from 'lost',and the standby can catch up if removed WAL files are copied either by manually or by a tool/script to the primary'spg_wal directory. IOW, the physical slots invalidated due to wal_removed are *somehow* recoverable unlike the logicalslots. > > IIUC, the invalidation of a slot implies that it is not guaranteed to hold any resources like WAL and XMINs. Does it alsoimply that the slot must be unusable? > If we can't hold the dead rows against xmin of the invalid slot, then how can we make it usable even after copying the required WAL? -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: