Introduce XID age based replication slot invalidation
От | John H |
---|---|
Тема | Introduce XID age based replication slot invalidation |
Дата | |
Msg-id | CA+-JvFsMHckBMzsu5Ov9HCG3AFbMh056hHy1FiXazBRtZ9pFBg@mail.gmail.com обсуждение исходный текст |
Ответы |
RE: Introduce XID age based replication slot invalidation
Re: Introduce XID age based replication slot invalidation |
Список | pgsql-hackers |
Hi folks, I'd like to restart the discussion about providing an xid-based slot invalidation mechanism. The previous effort [1] presented an XID and time-based invalidation and the inactive time-based approach was implemented first. The latest XID based patch from Bharath Rupireddy can be found here [2]. When thinking about availability of the database, inactive replication slots cause two main pain points: 1) WAL accumulation 2) Replication slots with xmin/catalog_xmin can hold back vacuuming leading to wrap-around The first issue can be mitigated by 'max_slot_wal_keep_size'. However in the second case there are no good mechanisms to prioritize write availability of the database and avoid wraparound. The new GUC 'idle_replication_slot_timeout' partially addresses the concern if you have similar workloads. However it's hard to set the same setting across a fleet of different applications. It's easy to imagine a high-XID churning workload in one cluster while another has large batch jobs where changes get synced out periodically. There isn't a "one-size" fits all setting for 'idle_replication_slot_timeout' in these two cases. The attached patch addresses this by introducing 'max_slot_xid_age' in a similar fashion. Replication slots with transaction ID greater than the set age will get invalidated allowing vacuum to proceed, biasing towards database availability. Invalidation happens in CHECKPOINT, similar to 'idle_replication_slot_timeout', and when VACUUM occurs. The patch currently attempts to invalidate once-per-autovacuum worker. We're wondering if it should attempt invalidation on a per-relation basis within the vacuum call itself. That would account for scenarios where the cost_delay or naptime is high between autovac executions. Thanks, John H [1] https://www.postgresql.org/message-id/flat/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe%2Baw%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CALj2ACXe8%2BxSNdMXTMaSRWUwX7v61Ad4iddUwnn%3DdjSwx3GLLg%40mail.gmail.com -- John Hsu - Amazon Web Services
Вложения
В списке pgsql-hackers по дате отправления: