Re: pgsql: Track last_inactive_time in pg_replication_slots.
От | Bharath Rupireddy |
---|---|
Тема | Re: pgsql: Track last_inactive_time in pg_replication_slots. |
Дата | |
Msg-id | CALj2ACVFV=yUa3DXXfJLOtJxUM8qzC_mEECMJ2iekDGPeQLkTw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pgsql: Track last_inactive_time in pg_replication_slots. (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: pgsql: Track last_inactive_time in pg_replication_slots.
(Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
|
Список | pgsql-hackers |
On Wed, Mar 27, 2024 at 10:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 26, 2024 at 9:10 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2024-Mar-26, Nathan Bossart wrote: > > > > > FWIW I'd really prefer to have something like max_slot_xid_age for this. A > > > time-based parameter would likely help with most cases, but transaction ID > > > usage will vary widely from server to server, so it'd be nice to have > > > something to protect against wraparound more directly. > > > > Yeah, I tend to agree that an XID-based limit makes more sense than a > > time-based one. > > > So, do we want the time-based parameter or just max_slot_xid_age > considering both will be GUC's? Yes, the xid_age based parameter > sounds to be directly helpful for transaction ID wraparound or dead > row cleanup, OTOH having a lot of inactive slots (which is possible in > use cases where a tool creates a lot of slots and forgets to remove > them, or the tool exits without cleaning up slots (say due to server > shutdown)) also prohibit removing dead space which is not nice either? I've personally seen the leftover slots problem on production systems where a timeout based invalidation mechanism could have been of more help to save time and reduce manual intervention. Usually, most if not all, migration/upgrade/other tools create slots, and if at all any errors occur or the operation gets cancelled midway, there's a chance that the slots can be leftover if such tools forgets to clean them up either because there was a bug or for whatever reasons. These are unintended/ghost slots for the database user unnecessarily holding up resources such as XIDs, dead rows and WAL; which might lead to XID wraparound or server crash if unnoticed. Although XID based GUC helps a lot, why do these unintended and unnoticed slots need to hold up the resources even before the XID age of say 1.5 or 2 billion is reached. With both GUCs max_slot_xid_age and replication_slot_inactive_timeout in place, I can set max_slot_xid_age = 1.5 billion or so and also set replication_slot_inactive_timeout = 2 days or so to make the database foolproof. > The one example that comes to mind is the pg_createsubscriber > (committed for PG17) which creates one slot per database to convert > standby to subscriber, now say it exits due to power shutdown then > there could be a lot of dangling slots on the primary server. Also, > say there is some bug in such a tool that doesn't allow proper cleanup > of slots, the same problem can happen; yeah, this would be a problem > of the tool but I think there is no harm in giving a way to avoid > problems at the server end due to such slots. Right. I can personally connect to this problem of leftover slots where manual intervention was needed to drop all such slots which is time-consuming and painful sometimes. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: