Re: BUG #18433: Logical replication timeout
От | Shlok Kyal |
---|---|
Тема | Re: BUG #18433: Logical replication timeout |
Дата | |
Msg-id | CANhcyEWtED9_UiTsaM_PYmBikpOh1BYxQFvdoWPEJe064vjLeQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #18433: Logical replication timeout (Kostiantyn Tomakh <tomahkvt@gmail.com>) |
Ответы |
Re: BUG #18433: Logical replication timeout
(Kostiantyn Tomakh <tomahkvt@gmail.com>)
|
Список | pgsql-bugs |
Hi, > I was able to reproduce the problem. > I did it on docker based platform I hope you will be able to reproduce this problem too. Thanks for providing the detailed steps to reproduce the issue. I was able to reproduce the issue with the steps you provided. I noticed that the issue regarding the increased table size on the subscriber can happen in all versions till Postgres 13 and I was able to reproduce that. This is a timing issue and hence you may not be getting this issue in postgres 10. This issue occurs because tablesync worker exits (due to UPDATE command) and restarts again as seen in logs: 2024-05-01 16:26:15.384 GMT [40] LOG: logical replication table synchronization worker for subscription "db_name_public_subscription", table "table" has started 2024-05-01 16:26:16.994 GMT [40] ERROR: logical replication target relation "public.table" has neither REPLICA IDENTITY index nor PRIMARY KEY and published relation does not have REPLICA IDENTITY FULL 2024-05-01 16:26:20.393 GMT [41] LOG: logical replication table synchronization worker for subscription "db_name_public_subscription", table "table" has started Tablesync worker sync the initial data from publisher to subscriber using COPY command. But in this case it exits (after copy phase is completed) and restarts, so it will perform entire copy operation again. And hence we can see the increased table size on the subscriber. This issue is not reproducible in Postgres 14 and above versions. This issue was mitigated after the commit [1]. In this commit a new state 'FINISHEDCOPY' is introduced. So if the tablesync worker exits (after copy phase is completed) and restarts, it donot not perform COPY command again and proceeds directly to synchronize the WAL position between tablesync worker and apply worker. code: + else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY) + { + /* + * The COPY phase was previously done, but tablesync then crashed + * before it was able to finish normally. + */ + StartTransactionCommand(); + + /* + * The origin tracking name must already exist. It was created first + * time this tablesync was launched. + */ + originid = replorigin_by_name(originname, false); + replorigin_session_setup(originid); + replorigin_session_origin = originid; + *origin_startpos = replorigin_session_get_progress(false); + + CommitTransactionCommand(); + + goto copy_table_done; + } Backpatching commit [1] to Postgres 13 and Postgres 12 will mitigate this issue. Thoughts? [1] https://github.com/postgres/postgres/commit/ce0fdbfe9722867b7fad4d3ede9b6a6bfc51fb4e Thanks and Regards, Shlok Kyal
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Corey HuinkerДата:
Сообщение: Re: BUG #18429: Inconsistent results on similar queries with join lateral