Re: Introduce XID age and inactive timeout based replication slot invalidation
От | Bharath Rupireddy |
---|---|
Тема | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Дата | |
Msg-id | CALj2ACWw9ohERck7Vm0oRvbGngPyp4ux2TKDwj4H-X7jN840JA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Introduce XID age and inactive timeout based replication slot invalidation (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Список | pgsql-hackers |
On Sat, Jan 27, 2024 at 1:18 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Jan 11, 2024 at 10:48 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Hi, > > > > Replication slots in postgres will prevent removal of required > > resources when there is no connection using them (inactive). This > > consumes storage because neither required WAL nor required rows from > > the user tables/system catalogs can be removed by VACUUM as long as > > they are required by a replication slot. In extreme cases this could > > cause the transaction ID wraparound. > > > > Currently postgres has the ability to invalidate inactive replication > > slots based on the amount of WAL (set via max_slot_wal_keep_size GUC) > > that will be needed for the slots in case they become active. However, > > the wraparound issue isn't effectively covered by > > max_slot_wal_keep_size - one can't tell postgres to invalidate a > > replication slot if it is blocking VACUUM. Also, it is often tricky to > > choose a default value for max_slot_wal_keep_size, because the amount > > of WAL that gets generated and allocated storage for the database can > > vary. > > > > Therefore, it is often easy for developers to do the following: > > a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5 > > billion, after which the slots get invalidated. > > b) set a timeout of say 1 or 2 or 3 days, after which the inactive > > slots get invalidated. > > > > To implement (a), postgres needs a new GUC called max_slot_xid_age. > > The checkpointer then invalidates all the slots whose xmin (the oldest > > transaction that this slot needs the database to retain) or > > catalog_xmin (the oldest transaction affecting the system catalogs > > that this slot needs the database to retain) has reached the age > > specified by this setting. > > > > To implement (b), first postgres needs to track the replication slot > > metrics like the time at which the slot became inactive (inactive_at > > timestamptz) and the total number of times the slot became inactive in > > its lifetime (inactive_count numeric) in ReplicationSlotPersistentData > > structure. And, then it needs a new timeout GUC called > > inactive_replication_slot_timeout. Whenever a slot becomes inactive, > > the current timestamp and inactive count are stored in > > ReplicationSlotPersistentData structure and persisted to disk. The > > checkpointer then invalidates all the slots that are lying inactive > > for about inactive_replication_slot_timeout duration starting from > > inactive_at. > > > > In addition to implementing (b), these two new metrics enable > > developers to improve their monitoring tools as the metrics are > > exposed via pg_replication_slots system view. For instance, one can > > build a monitoring tool that signals when replication slots are lying > > inactive for a day or so using inactive_at metric, and/or when a > > replication slot is becoming inactive too frequently using inactive_at > > metric. > > > > I’m attaching the v1 patch set as described below: > > 0001 - Tracks invalidation_reason in pg_replication_slots. This is > > needed because slots now have multiple reasons for slot invalidation. > > 0002 - Tracks inactive replication slot information inactive_at and > > inactive_timeout. > > 0003 - Adds inactive_timeout based replication slot invalidation. > > 0004 - Adds XID based replication slot invalidation. > > > > Thoughts? > > Needed a rebase due to c393308b. Please find the attached v2 patch set. Needed a rebase due to commit 776621a (conflict in src/test/recovery/meson.build for new TAP test file added). Please find the attached v3 patch set. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: