Re: pgsql: Track last_inactive_time in pg_replication_slots.

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: pgsql: Track last_inactive_time in pg_replication_slots.
Дата
Msg-id CALj2ACVFV=yUa3DXXfJLOtJxUM8qzC_mEECMJ2iekDGPeQLkTw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgsql: Track last_inactive_time in pg_replication_slots.  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: pgsql: Track last_inactive_time in pg_replication_slots.  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Wed, Mar 27, 2024 at 10:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 9:10 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2024-Mar-26, Nathan Bossart wrote:
> >
> > > FWIW I'd really prefer to have something like max_slot_xid_age for this.  A
> > > time-based parameter would likely help with most cases, but transaction ID
> > > usage will vary widely from server to server, so it'd be nice to have
> > > something to protect against wraparound more directly.
> >
> > Yeah, I tend to agree that an XID-based limit makes more sense than a
> > time-based one.
> >
> So, do we want the time-based parameter or just max_slot_xid_age
> considering both will be GUC's? Yes, the xid_age based parameter
> sounds to be directly helpful for transaction ID wraparound or dead
> row cleanup, OTOH having a lot of inactive slots (which is possible in
> use cases where a tool creates a lot of slots and forgets to remove
> them, or the tool exits without cleaning up slots (say due to server
> shutdown)) also prohibit removing dead space which is not nice either?

I've personally seen the leftover slots problem on production systems
where a timeout based invalidation mechanism could have been of more
help to save time and reduce manual intervention. Usually, most if not
all, migration/upgrade/other tools create slots, and if at all any
errors occur or the operation gets cancelled midway, there's a chance
that the slots can be leftover if such tools forgets to clean them up
either because there was a bug or for whatever reasons. These are
unintended/ghost slots for the database user unnecessarily holding up
resources such as XIDs, dead rows and WAL; which might lead to XID
wraparound or server crash if unnoticed. Although XID based GUC helps
a lot, why do these unintended and unnoticed slots need to hold up the
resources even before the XID age of say 1.5 or 2 billion is reached.

With both GUCs max_slot_xid_age and replication_slot_inactive_timeout
in place, I can set max_slot_xid_age = 1.5 billion or so and also set
replication_slot_inactive_timeout =  2 days or so to make the database
foolproof.

> The one example that comes to mind is the pg_createsubscriber
> (committed for PG17) which creates one slot per database to convert
> standby to subscriber, now say it exits due to power shutdown then
> there could be a lot of dangling slots on the primary server. Also,
> say there is some bug in such a tool that doesn't allow proper cleanup
> of slots, the same problem can happen; yeah, this would be a problem
> of the tool but I think there is no harm in giving a way to avoid
> problems at the server end due to such slots.

Right. I can personally connect to this problem of leftover slots
where manual intervention was needed to drop all such slots which is
time-consuming and painful sometimes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: Propagate pathkeys from CTEs up to the outer query
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: pg_upgrade and logical replication