Re: Introduce XID age and inactive timeout based replication slot invalidation

Поиск
Список
Период
Сортировка
От Bertrand Drouvot
Тема Re: Introduce XID age and inactive timeout based replication slot invalidation
Дата
Msg-id ZfldthofFLdPibEM@ip-10-97-1-34.eu-west-3.compute.internal
обсуждение исходный текст
Ответ на Re: Introduce XID age and inactive timeout based replication slot invalidation  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Introduce XID age and inactive timeout based replication slot invalidation  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi,

On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Agree. While it makes sense to invalidate slots for wal removal in
> > CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> > sure this is the right place for the 2 new cases.
> >
> > Let's focus on the timeout one as proposed above (as probably the simplest one):
> > as this one is purely related to time and activity what about to invalidate them
> > when?:
> >
> > - their usage resume
> > - in pg_get_replication_slots()
> >
> > The idea is to invalidate the slot when one resumes activity on it or wants to
> > get information about it (and among other things wants to know if the slot is
> > valid or not).
> >
> 
> Trying to invalidate at those two places makes sense to me but we
> still need to cover the cases where it takes very long to resume the
> slot activity and the dangling slot cases where the activity is never
> resumed.

I understand it's better to have the slot reflecting its real status internally
but it is a real issue if that's not the case until the activity on it is resumed?
(just asking, not saying we should not)

> How about apart from the above two places, trying to
> invalidate in CheckPointReplicationSlots() where we are traversing all
> the slots?

I think that's a good place but there is still a window of time (that could also
be "large" depending of the activity and the checkpoint frequency) during which
the slot is not known as invalid internally. But yeah, at leat we know that we'll
mark it as invalid at some point...

BTW:

        if (am_walsender)
        {
+               if (slot->data.persistency == RS_PERSISTENT)
+               {
+                       SpinLockAcquire(&slot->mutex);
+                       slot->data.last_inactive_at = GetCurrentTimestamp();
+                       slot->data.inactive_count++;
+                       SpinLockRelease(&slot->mutex);

I'm also feeling the same concern as Shveta mentioned in [1]: that a "normal"
backend using pg_logical_slot_get_changes() or friends would not set the
last_inactive_at.

[1]: https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Inconsistent printf placeholders