Обсуждение: BUG #18415: Logical replication errors.

Поиск
Список
Период
Сортировка

BUG #18415: Logical replication errors.

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      18415
Logged by:          Andy Kosheen
Email address:      koderan@yandex.ru
PostgreSQL version: 15.4
Operating system:   Oracle Linux Server 8.4
Description:

In the process of logical replication from version 13.3 (on Ubuntu 16.04.6
LTS) to version 15.4 (on Oracle Linux Server 8.4), when most of the data has
already been transferred, the following errors begin to occur, appearing
endlessly until you stop the replication process (this situation occurs on
databases larger than at least 50GB, even if there is a small number of
transactions on the source database):
2024-03-20 03:19:41.410 +07 [3433858] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:19:41.425 +07 [3433858] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:19:41.426 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3433858) exited with exit code 1
2024-03-20 03:19:46.434 +07 [3433865] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:19:46.449 +07 [3433865] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:19:46.450 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3433865) exited with exit code 1
2024-03-20 03:19:51.459 +07 [3433914] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:19:51.481 +07 [3433914] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:19:51.484 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3433914) exited with exit code 1
2024-03-20 03:19:56.493 +07 [3433921] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:19:56.514 +07 [3433921] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:19:56.517 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3433921) exited with exit code 1
2024-03-20 03:20:01.525 +07 [3433976] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:20:01.548 +07 [3433976] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:20:01.551 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3433976) exited with exit code 1
2024-03-20 03:20:06.559 +07 [3434036] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:20:06.580 +07 [3434036] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:20:06.583 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3434036) exited with exit code 1
2024-03-20 03:20:11.591 +07 [3434087] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:20:11.613 +07 [3434087] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:20:11.616 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3434087) exited with exit code 1
2024-03-20 03:20:16.624 +07 [3434103] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:20:16.645 +07 [3434103] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist
2024-03-20 03:20:16.648 +07 [1597281] LOG:  background worker "logical
replication worker" (PID 3434103) exited with exit code 1
2024-03-20 03:20:21.656 +07 [3434157] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:20:21.671 +07 [3434157] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist


Re: BUG #18415: Logical replication errors.

От
"David G. Johnston"
Дата:
On Thu, Mar 28, 2024 at 9:59 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      18415
Logged by:          Andy Kosheen
Email address:      koderan@yandex.ru
PostgreSQL version: 15.4
Operating system:   Oracle Linux Server 8.4
Description:       

In the process of logical replication from version 13.3 (on Ubuntu 16.04.6
LTS) to version 15.4 (on Oracle Linux Server 8.4), when most of the data has
already been transferred, the following errors begin to occur, appearing
endlessly until you stop the replication process (this situation occurs on
databases larger than at least 50GB, even if there is a small number of
transactions on the source database):
2024-03-20 03:19:41.410 +07 [3433858] LOG:  logical replication apply worker
for subscription "log_rep_sub" has started
2024-03-20 03:19:41.425 +07 [3433858] ERROR:  could not start WAL streaming:
ERROR:  replication slot "log_rep_sub" does not exist


Suggest you check the publisher logs and see why the replication slot went away.

David J.

Re: BUG #18415: Logical replication errors.

От
vignesh C
Дата:
On Fri, 29 Mar 2024 at 05:50, koderan <koderan@yandex.ru> wrote:
>
> There is publisher and subscriber logs, but the logs on both nodes did not give me clarity.
> Master (time when errors start to appear):
>
> 2024-03-28 22:26:08.459 +07 [1004642] ERROR:  could not start WAL streaming: ERROR:  replication slot
"pg_975006_sync_972467_7293549148661379912"does not exist 
> 2024-03-28 22:26:08.462 +07 [1597281] LOG:  background worker "logical replication worker" (PID 1004642) exited with
exitcode 1 
> 2024-03-28 22:26:08.466 +07 [1048261] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "scan_type_i18n" has started 
> 2024-03-28 22:26:08.542 +07 [1048261] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "scan_type_i18n" has finished 
> 2024-03-28 22:26:08.548 +07 [1048263] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "phone_type_i18n" has started 
> 2024-03-28 22:26:08.641 +07 [1048263] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "phone_type_i18n" has finished 
> 2024-03-28 22:26:08.644 +07 [1048264] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "reply_status_type_i18n" has started 
> 2024-03-28 22:26:08.690 +07 [1048264] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "reply_status_type_i18n" has finished 
> 2024-03-28 22:26:08.693 +07 [1048265] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "user" has started 
> 2024-03-28 22:26:09.054 +07 [1048265] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "user" has finished 
> 2024-03-28 22:26:09.059 +07 [1048267] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "queue_event" has started 
> 2024-03-28 22:26:09.141 +07 [1048267] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "queue_event" has finished 
> 2024-03-28 22:26:09.146 +07 [1048274] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "credit" has started 
> 2024-03-28 22:26:09.163 +07 [1048274] ERROR:  could not start WAL streaming: ERROR:  replication slot
"pg_975006_sync_972243_7293549148661379912"does not exist 
> 2024-03-28 22:26:09.165 +07 [1597281] LOG:  background worker "logical replication worker" (PID 1048274) exited with
exitcode 1 
> 2024-03-28 22:26:09.167 +07 [1048281] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "requisite_value_contragent" has started 
> 2024-03-28 22:26:09.184 +07 [1048281] ERROR:  could not start WAL streaming: ERROR:  replication slot
"pg_975006_sync_972563_7293549148661379912"does not exist 
> 2024-03-28 22:26:09.186 +07 [1597281] LOG:  background worker "logical replication worker" (PID 1048281) exited with
exitcode 1 
> 2024-03-28 22:26:09.189 +07 [1048282] LOG:  logical replication table synchronization worker for subscription
"log_rep_sub",table "order_registry_np" has started 
>
> The same time on replica:
>
> 2024-03-28 22:26:07.967 +07 [25424] postgres@claims LOG:  duration: 2406278.201 ms  statement: COPY public.orders
(id,order_number, order_date, order_amount, payment_info_code, payer_code, payer_name, receiver_code, receiver_name,
sender_code,sender_name, cargo_description, cargo_volume_weight, cargo_weight, declared_price, currency_code,
internet_store_number,subdivision_code, cfo_code, active, date_created, date_updated, delay) TO STDOUT 
> 2024-03-28 22:26:08.426 +07 [25424] postgres@claims ERROR:  replication slot
"pg_975006_sync_972467_7293549148661379912"does not exist 
> 2024-03-28 22:26:08.466 +07 [1170] postgres@claims LOG:  logical decoding found initial starting point at
B1C5/BFBA3410
> 2024-03-28 22:26:08.466 +07 [1170] postgres@claims DETAIL:  Waiting for transactions (approximately 1) older than
2998665818to end. 
> 2024-03-28 22:26:08.469 +07 [1165] [unknown]@[unknown] LOG:  could not accept SSL connection: EOF detected
> 2024-03-28 22:26:08.496 +07 [1170] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBA4580
> 2024-03-28 22:26:08.496 +07 [1170] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:08.507 +07 [1170] postgres@claims LOG:  starting logical decoding for slot
"pg_975006_sync_971138_7293549148661379912"
> 2024-03-28 22:26:08.507 +07 [1170] postgres@claims DETAIL:  Streaming transactions committing after B1C5/BFBA45B8,
readingWAL from B1C5/BFBA3290. 
> 2024-03-28 22:26:08.507 +07 [1170] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBA3410
> 2024-03-28 22:26:08.507 +07 [1170] postgres@claims DETAIL:  Logical decoding will begin using saved snapshot.
> 2024-03-28 22:26:08.546 +07 [1172] postgres@claims LOG:  logical decoding found initial starting point at
B1C5/BFBAA200
> 2024-03-28 22:26:08.546 +07 [1172] postgres@claims DETAIL:  Waiting for transactions (approximately 2) older than
2998665822to end. 
> 2024-03-28 22:26:08.594 +07 [1172] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBB5E80
> 2024-03-28 22:26:08.594 +07 [1172] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:08.602 +07 [1172] postgres@claims LOG:  starting logical decoding for slot
"pg_975006_sync_970776_7293549148661379912"
> 2024-03-28 22:26:08.602 +07 [1172] postgres@claims DETAIL:  Streaming transactions committing after B1C5/BFBB5EB8,
readingWAL from B1C5/BFBA5290. 
> 2024-03-28 22:26:08.602 +07 [1172] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBAA200
> 2024-03-28 22:26:08.602 +07 [1172] postgres@claims DETAIL:  Logical decoding will begin using saved snapshot.
> 2024-03-28 22:26:08.645 +07 [1173] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBB8E28
> 2024-03-28 22:26:08.645 +07 [1173] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:08.652 +07 [1173] postgres@claims LOG:  starting logical decoding for slot
"pg_975006_sync_971808_7293549148661379912"
> 2024-03-28 22:26:08.652 +07 [1173] postgres@claims DETAIL:  Streaming transactions committing after B1C5/BFBB8E60,
readingWAL from B1C5/BFBB8E28. 
> 2024-03-28 22:26:08.652 +07 [1173] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBB8E28
> 2024-03-28 22:26:08.652 +07 [1173] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:08.697 +07 [1174] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBBE588
> 2024-03-28 22:26:08.697 +07 [1174] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:09.019 +07 [1174] postgres@claims LOG:  starting logical decoding for slot
"pg_975006_sync_971995_7293549148661379912"
> 2024-03-28 22:26:09.019 +07 [1174] postgres@claims DETAIL:  Streaming transactions committing after B1C5/BFBBE5C0,
readingWAL from B1C5/BFBBE588. 
> 2024-03-28 22:26:09.019 +07 [1174] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBBE588
> 2024-03-28 22:26:09.019 +07 [1174] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:09.056 +07 [1180] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBE0C88
> 2024-03-28 22:26:09.056 +07 [1180] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:09.105 +07 [1180] postgres@claims LOG:  starting logical decoding for slot
"pg_975006_sync_972492_7293549148661379912"
> 2024-03-28 22:26:09.105 +07 [1180] postgres@claims DETAIL:  Streaming transactions committing after B1C5/BFBE0CC0,
readingWAL from B1C5/BFBE0C88. 
> 2024-03-28 22:26:09.105 +07 [1180] postgres@claims LOG:  logical decoding found consistent point at B1C5/BFBE0C88
> 2024-03-28 22:26:09.105 +07 [1180] postgres@claims DETAIL:  There are no running transactions.
> 2024-03-28 22:26:09.129 +07 [1181] postgres@claims ERROR:  replication slot
"pg_975006_sync_972243_7293549148661379912"does not exist 
> 2024-03-28 22:26:09.151 +07 [1182] postgres@claims ERROR:  replication slot
"pg_975006_sync_972563_7293549148661379912"does not exist 
> 2024-03-28 22:26:09.172 +07 [1184] postgres@claims ERROR:  replication slot
"pg_975006_sync_972454_7293549148661379912"does not exist 
> 2024-03-28 22:26:09.196 +07 [1185] postgres@claims ERROR:  replication slot
"pg_975006_sync_972467_7293549148661379912"does not exist 
>
>
> 28.03.2024, 20:03, "David G. Johnston" <david.g.johnston@gmail.com>:
>
> On Thu, Mar 28, 2024 at 9:59 AM PG Bug reporting form <noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18415
> Logged by:          Andy Kosheen
> Email address:      koderan@yandex.ru
> PostgreSQL version: 15.4
> Operating system:   Oracle Linux Server 8.4
> Description:
>
> In the process of logical replication from version 13.3 (on Ubuntu 16.04.6
> LTS) to version 15.4 (on Oracle Linux Server 8.4), when most of the data has
> already been transferred, the following errors begin to occur, appearing
> endlessly until you stop the replication process (this situation occurs on
> databases larger than at least 50GB, even if there is a small number of
> transactions on the source database):
> 2024-03-20 03:19:41.410 +07 [3433858] LOG:  logical replication apply worker
> for subscription "log_rep_sub" has started
> 2024-03-20 03:19:41.425 +07 [3433858] ERROR:  could not start WAL streaming:
> ERROR:  replication slot "log_rep_sub" does not exist

By any chance do you have a reproducible test case for this. Also are
there any other abnormalities in the log. What was the
max_replication_slots configuration set in the publisher and what are
the replication slots that were being used in the publisher at that
point(select * from pg_replication_slots).

Regards,
Vignesh