Re: Introduce XID age based replication slot invalidation
От | Bharath Rupireddy |
---|---|
Тема | Re: Introduce XID age based replication slot invalidation |
Дата | |
Msg-id | CALj2ACVeZb7AhzjTf+Mzu3OyA5hVyNbHzGUPTvFukMh8-Zmi5Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Introduce XID age based replication slot invalidation (John H <johnhyvr@gmail.com>) |
Список | pgsql-hackers |
Hi, On Thu, Sep 18, 2025 at 10:20 AM John H <johnhyvr@gmail.com> wrote: > > I'd like to restart the discussion about providing an xid-based slot > invalidation mechanism. The previous effort [1] presented an XID and > time-based invalidation and the inactive time-based approach was > implemented first. The latest XID based patch from Bharath Rupireddy > can be found here [2]. > > When thinking about availability of the database, inactive replication > slots cause two main pain points: > 1) WAL accumulation > 2) Replication slots with xmin/catalog_xmin can hold back vacuuming > leading to wrap-around > > It's easy to imagine a high-XID churning workload in one cluster while > another has large batch jobs where changes get synced out > periodically. There isn't a "one-size" fits all setting for > 'idle_replication_slot_timeout' in these two cases. +1. > The attached patch addresses this by introducing 'max_slot_xid_age' in > a similar fashion. Replication slots with transaction ID greater than > the set age will get invalidated allowing vacuum to proceed, biasing > towards database availability. > > Invalidation happens in CHECKPOINT, similar to > 'idle_replication_slot_timeout', and when VACUUM occurs. > > The patch currently attempts to invalidate once-per-autovacuum worker. > We're wondering if it should attempt invalidation on a per-relation > basis within the vacuum call itself. That would account for scenarios > where the cost_delay or naptime is high between autovac executions. IMO, computing XID horizons per-relation during vacuum is good. The main reason we try to invalidate replication slots based on the XID age in the vacuum path is to help the database when it needs it most - when vacuum is computing the XID horizons. That said, it would be good to have performance analysis with a large number of replication slots, comparing once-per-relation vs. once-per-autovacuum worker vs. once-per-autovacuum launcher wake-up cycle. I haven't looked at the patch in depth, but it would be good to have a TAP test with more realistic production workloads. We could set this value to less than 1.5 billion and use xid_wraparound test to quickly reach the wraparound limits, then verify if this setting can help prevent the database from reaching wraparound errors. This approach would also validate the age calculations in try_replication_slot_invalidation with higher limits. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: