Re: speed up a logical replica setup
От | Amit Kapila |
---|---|
Тема | Re: speed up a logical replica setup |
Дата | |
Msg-id | CAA4eK1+Qmc34cooSNm2=U6YsySSjZTn2_eD_deDFEAZv+aj-AA@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: speed up a logical replica setup ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
RE: speed up a logical replica setup
|
Список | pgsql-hackers |
On Wed, Jul 3, 2024 at 10:42 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Based on that, I considered a scenario why the slot could not be synchronized. > I felt this was not caused by the pg_createsubscriber. > > 1. At initial stage, the xmin of the physical slot is 743, and nextXid of the > primary is also 743. > 2. Autovacuum worker starts a new transaction. nextXid is incremented to 744. > 3. Tries to creates a logical replication slot with failover=true *before the > transaction at step2 is replicated to the standby*. > 4. While creating the slot, the catalog_xmin must be determined. > The initial candidate is nextXid (= 744), but the oldest xmin of replication > slots (=743) is used if it is older than nextXid. So 743 is chosen in this case. > This operaion is done in CreateInitDecodingContext()->GetOldestSafeDecodingContext(). > 5. After that, the transaction at step2 is reached to the standby node and it > updates the nextXid. > 6. Finally runs pg pg_sync_replication_slots() on the standby. It finds a failover > slot on the primary and tries to create on the standby. However, the > catalog_xmin on the primary (743) is older than the nextXid of the standby (744) > so that it skips to create a slot. > > To avoid the issue, we can disable the autovacuuming while testing. > Your analysis looks correct to me. The test could fail due to autovacuum. See the following comment in 040_standby_failover_slots_sync. # Disable autovacuum to avoid generating xid during stats update as otherwise # the new XID could then be replicated to standby at some random point making # slots at primary lag behind standby during slot sync. $publisher->append_conf('postgresql.conf', 'autovacuum = off'); > # Descriptions for attached files > > An attached script can be used to reproduce the first failure without pg_createsubscriber. > It requires to modify the code like [1]. > 0003 patch disables autovacuum for node_p and node_s. I think node_p is enough, but did > like that just in case. This fixes a second failure. > Disabling on the primary node should be sufficient. Let's do the minimum required to stabilize this test. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: