Обсуждение: BUG #18415: Logical replication errors.
The following bug has been logged on the website: Bug reference: 18415 Logged by: Andy Kosheen Email address: koderan@yandex.ru PostgreSQL version: 15.4 Operating system: Oracle Linux Server 8.4 Description: In the process of logical replication from version 13.3 (on Ubuntu 16.04.6 LTS) to version 15.4 (on Oracle Linux Server 8.4), when most of the data has already been transferred, the following errors begin to occur, appearing endlessly until you stop the replication process (this situation occurs on databases larger than at least 50GB, even if there is a small number of transactions on the source database): 2024-03-20 03:19:41.410 +07 [3433858] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:19:41.425 +07 [3433858] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:19:41.426 +07 [1597281] LOG: background worker "logical replication worker" (PID 3433858) exited with exit code 1 2024-03-20 03:19:46.434 +07 [3433865] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:19:46.449 +07 [3433865] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:19:46.450 +07 [1597281] LOG: background worker "logical replication worker" (PID 3433865) exited with exit code 1 2024-03-20 03:19:51.459 +07 [3433914] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:19:51.481 +07 [3433914] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:19:51.484 +07 [1597281] LOG: background worker "logical replication worker" (PID 3433914) exited with exit code 1 2024-03-20 03:19:56.493 +07 [3433921] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:19:56.514 +07 [3433921] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:19:56.517 +07 [1597281] LOG: background worker "logical replication worker" (PID 3433921) exited with exit code 1 2024-03-20 03:20:01.525 +07 [3433976] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:20:01.548 +07 [3433976] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:20:01.551 +07 [1597281] LOG: background worker "logical replication worker" (PID 3433976) exited with exit code 1 2024-03-20 03:20:06.559 +07 [3434036] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:20:06.580 +07 [3434036] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:20:06.583 +07 [1597281] LOG: background worker "logical replication worker" (PID 3434036) exited with exit code 1 2024-03-20 03:20:11.591 +07 [3434087] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:20:11.613 +07 [3434087] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:20:11.616 +07 [1597281] LOG: background worker "logical replication worker" (PID 3434087) exited with exit code 1 2024-03-20 03:20:16.624 +07 [3434103] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:20:16.645 +07 [3434103] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist 2024-03-20 03:20:16.648 +07 [1597281] LOG: background worker "logical replication worker" (PID 3434103) exited with exit code 1 2024-03-20 03:20:21.656 +07 [3434157] LOG: logical replication apply worker for subscription "log_rep_sub" has started 2024-03-20 03:20:21.671 +07 [3434157] ERROR: could not start WAL streaming: ERROR: replication slot "log_rep_sub" does not exist
On Thu, Mar 28, 2024 at 9:59 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18415
Logged by: Andy Kosheen
Email address: koderan@yandex.ru
PostgreSQL version: 15.4
Operating system: Oracle Linux Server 8.4
Description:
In the process of logical replication from version 13.3 (on Ubuntu 16.04.6
LTS) to version 15.4 (on Oracle Linux Server 8.4), when most of the data has
already been transferred, the following errors begin to occur, appearing
endlessly until you stop the replication process (this situation occurs on
databases larger than at least 50GB, even if there is a small number of
transactions on the source database):
2024-03-20 03:19:41.410 +07 [3433858] LOG: logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:19:41.425 +07 [3433858] ERROR: could not start WAL streaming:
ERROR: replication slot "log_rep_sub" does not exist
Suggest you check the publisher logs and see why the replication slot went away.
David J.
On Fri, 29 Mar 2024 at 05:50, koderan <koderan@yandex.ru> wrote: > > There is publisher and subscriber logs, but the logs on both nodes did not give me clarity. > Master (time when errors start to appear): > > 2024-03-28 22:26:08.459 +07 [1004642] ERROR: could not start WAL streaming: ERROR: replication slot "pg_975006_sync_972467_7293549148661379912"does not exist > 2024-03-28 22:26:08.462 +07 [1597281] LOG: background worker "logical replication worker" (PID 1004642) exited with exitcode 1 > 2024-03-28 22:26:08.466 +07 [1048261] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "scan_type_i18n" has started > 2024-03-28 22:26:08.542 +07 [1048261] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "scan_type_i18n" has finished > 2024-03-28 22:26:08.548 +07 [1048263] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "phone_type_i18n" has started > 2024-03-28 22:26:08.641 +07 [1048263] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "phone_type_i18n" has finished > 2024-03-28 22:26:08.644 +07 [1048264] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "reply_status_type_i18n" has started > 2024-03-28 22:26:08.690 +07 [1048264] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "reply_status_type_i18n" has finished > 2024-03-28 22:26:08.693 +07 [1048265] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "user" has started > 2024-03-28 22:26:09.054 +07 [1048265] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "user" has finished > 2024-03-28 22:26:09.059 +07 [1048267] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "queue_event" has started > 2024-03-28 22:26:09.141 +07 [1048267] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "queue_event" has finished > 2024-03-28 22:26:09.146 +07 [1048274] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "credit" has started > 2024-03-28 22:26:09.163 +07 [1048274] ERROR: could not start WAL streaming: ERROR: replication slot "pg_975006_sync_972243_7293549148661379912"does not exist > 2024-03-28 22:26:09.165 +07 [1597281] LOG: background worker "logical replication worker" (PID 1048274) exited with exitcode 1 > 2024-03-28 22:26:09.167 +07 [1048281] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "requisite_value_contragent" has started > 2024-03-28 22:26:09.184 +07 [1048281] ERROR: could not start WAL streaming: ERROR: replication slot "pg_975006_sync_972563_7293549148661379912"does not exist > 2024-03-28 22:26:09.186 +07 [1597281] LOG: background worker "logical replication worker" (PID 1048281) exited with exitcode 1 > 2024-03-28 22:26:09.189 +07 [1048282] LOG: logical replication table synchronization worker for subscription "log_rep_sub",table "order_registry_np" has started > > The same time on replica: > > 2024-03-28 22:26:07.967 +07 [25424] postgres@claims LOG: duration: 2406278.201 ms statement: COPY public.orders (id,order_number, order_date, order_amount, payment_info_code, payer_code, payer_name, receiver_code, receiver_name, sender_code,sender_name, cargo_description, cargo_volume_weight, cargo_weight, declared_price, currency_code, internet_store_number,subdivision_code, cfo_code, active, date_created, date_updated, delay) TO STDOUT > 2024-03-28 22:26:08.426 +07 [25424] postgres@claims ERROR: replication slot "pg_975006_sync_972467_7293549148661379912"does not exist > 2024-03-28 22:26:08.466 +07 [1170] postgres@claims LOG: logical decoding found initial starting point at B1C5/BFBA3410 > 2024-03-28 22:26:08.466 +07 [1170] postgres@claims DETAIL: Waiting for transactions (approximately 1) older than 2998665818to end. > 2024-03-28 22:26:08.469 +07 [1165] [unknown]@[unknown] LOG: could not accept SSL connection: EOF detected > 2024-03-28 22:26:08.496 +07 [1170] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBA4580 > 2024-03-28 22:26:08.496 +07 [1170] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:08.507 +07 [1170] postgres@claims LOG: starting logical decoding for slot "pg_975006_sync_971138_7293549148661379912" > 2024-03-28 22:26:08.507 +07 [1170] postgres@claims DETAIL: Streaming transactions committing after B1C5/BFBA45B8, readingWAL from B1C5/BFBA3290. > 2024-03-28 22:26:08.507 +07 [1170] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBA3410 > 2024-03-28 22:26:08.507 +07 [1170] postgres@claims DETAIL: Logical decoding will begin using saved snapshot. > 2024-03-28 22:26:08.546 +07 [1172] postgres@claims LOG: logical decoding found initial starting point at B1C5/BFBAA200 > 2024-03-28 22:26:08.546 +07 [1172] postgres@claims DETAIL: Waiting for transactions (approximately 2) older than 2998665822to end. > 2024-03-28 22:26:08.594 +07 [1172] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBB5E80 > 2024-03-28 22:26:08.594 +07 [1172] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:08.602 +07 [1172] postgres@claims LOG: starting logical decoding for slot "pg_975006_sync_970776_7293549148661379912" > 2024-03-28 22:26:08.602 +07 [1172] postgres@claims DETAIL: Streaming transactions committing after B1C5/BFBB5EB8, readingWAL from B1C5/BFBA5290. > 2024-03-28 22:26:08.602 +07 [1172] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBAA200 > 2024-03-28 22:26:08.602 +07 [1172] postgres@claims DETAIL: Logical decoding will begin using saved snapshot. > 2024-03-28 22:26:08.645 +07 [1173] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBB8E28 > 2024-03-28 22:26:08.645 +07 [1173] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:08.652 +07 [1173] postgres@claims LOG: starting logical decoding for slot "pg_975006_sync_971808_7293549148661379912" > 2024-03-28 22:26:08.652 +07 [1173] postgres@claims DETAIL: Streaming transactions committing after B1C5/BFBB8E60, readingWAL from B1C5/BFBB8E28. > 2024-03-28 22:26:08.652 +07 [1173] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBB8E28 > 2024-03-28 22:26:08.652 +07 [1173] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:08.697 +07 [1174] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBBE588 > 2024-03-28 22:26:08.697 +07 [1174] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:09.019 +07 [1174] postgres@claims LOG: starting logical decoding for slot "pg_975006_sync_971995_7293549148661379912" > 2024-03-28 22:26:09.019 +07 [1174] postgres@claims DETAIL: Streaming transactions committing after B1C5/BFBBE5C0, readingWAL from B1C5/BFBBE588. > 2024-03-28 22:26:09.019 +07 [1174] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBBE588 > 2024-03-28 22:26:09.019 +07 [1174] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:09.056 +07 [1180] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBE0C88 > 2024-03-28 22:26:09.056 +07 [1180] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:09.105 +07 [1180] postgres@claims LOG: starting logical decoding for slot "pg_975006_sync_972492_7293549148661379912" > 2024-03-28 22:26:09.105 +07 [1180] postgres@claims DETAIL: Streaming transactions committing after B1C5/BFBE0CC0, readingWAL from B1C5/BFBE0C88. > 2024-03-28 22:26:09.105 +07 [1180] postgres@claims LOG: logical decoding found consistent point at B1C5/BFBE0C88 > 2024-03-28 22:26:09.105 +07 [1180] postgres@claims DETAIL: There are no running transactions. > 2024-03-28 22:26:09.129 +07 [1181] postgres@claims ERROR: replication slot "pg_975006_sync_972243_7293549148661379912"does not exist > 2024-03-28 22:26:09.151 +07 [1182] postgres@claims ERROR: replication slot "pg_975006_sync_972563_7293549148661379912"does not exist > 2024-03-28 22:26:09.172 +07 [1184] postgres@claims ERROR: replication slot "pg_975006_sync_972454_7293549148661379912"does not exist > 2024-03-28 22:26:09.196 +07 [1185] postgres@claims ERROR: replication slot "pg_975006_sync_972467_7293549148661379912"does not exist > > > 28.03.2024, 20:03, "David G. Johnston" <david.g.johnston@gmail.com>: > > On Thu, Mar 28, 2024 at 9:59 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18415 > Logged by: Andy Kosheen > Email address: koderan@yandex.ru > PostgreSQL version: 15.4 > Operating system: Oracle Linux Server 8.4 > Description: > > In the process of logical replication from version 13.3 (on Ubuntu 16.04.6 > LTS) to version 15.4 (on Oracle Linux Server 8.4), when most of the data has > already been transferred, the following errors begin to occur, appearing > endlessly until you stop the replication process (this situation occurs on > databases larger than at least 50GB, even if there is a small number of > transactions on the source database): > 2024-03-20 03:19:41.410 +07 [3433858] LOG: logical replication apply worker > for subscription "log_rep_sub" has started > 2024-03-20 03:19:41.425 +07 [3433858] ERROR: could not start WAL streaming: > ERROR: replication slot "log_rep_sub" does not exist By any chance do you have a reproducible test case for this. Also are there any other abnormalities in the log. What was the max_replication_slots configuration set in the publisher and what are the replication slots that were being used in the publisher at that point(select * from pg_replication_slots). Regards, Vignesh