Обсуждение: Frosen logical replication
I have a logical replication where I want to replicate only one schema.
This schema has auditing of every customer, so a thousand of tables, one for each customer
CREATE PUBLICATION pub_a FOR TABLES IN SCHEMA audit;
All worked fine, it copied all tables to subscriber, except one. That table has 8GB and it copied 5GB. That copy has been frozen since yesterday, as you can see.
What can I do to solve this ?
This select returns two records, first for processing WAL being generated and the second problematic record.
# select usename, application_name, state_change, wait_event_type, wait_event, state, query, backend_type from pg_stat_activity where usename = 'replication_user';
-[ RECORD 1 ]----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
usename | replication_user
application_name | sub_a
state_change | 2025-12-05 08:37:28.05245-03
wait_event_type | [null]
wait_event | [null]
state | active
query | START_REPLICATION SLOT "sub_a" LOGICAL 3C14/6C6EA898 (proto_version '4', streaming 'parallel', origin 'any', publication_names '"pub_a"')
backend_type | walsender
-[ RECORD 2 ]----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
usename | replication_user
application_name | pg_14944054_sync_29806_7565484157933843059
state_change | 2025-12-04 08:17:14.73826-03
wait_event_type | Client
wait_event | ClientWrite
state | active
query | COPY audit.audit_0749 (audit_id, table_schema, table_name, primary_key, user_id, audit_action, audit_date_time, transaction_number, old, new) TO STDOUT
backend_type | walsender
-[ RECORD 1 ]----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
usename | replication_user
application_name | sub_a
state_change | 2025-12-05 08:37:28.05245-03
wait_event_type | [null]
wait_event | [null]
state | active
query | START_REPLICATION SLOT "sub_a" LOGICAL 3C14/6C6EA898 (proto_version '4', streaming 'parallel', origin 'any', publication_names '"pub_a"')
backend_type | walsender
-[ RECORD 2 ]----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
usename | replication_user
application_name | pg_14944054_sync_29806_7565484157933843059
state_change | 2025-12-04 08:17:14.73826-03
wait_event_type | Client
wait_event | ClientWrite
state | active
query | COPY audit.audit_0749 (audit_id, table_schema, table_name, primary_key, user_id, audit_action, audit_date_time, transaction_number, old, new) TO STDOUT
backend_type | walsender
regards
Marcos
Em sex., 5 de dez. de 2025 às 09:03, Marcos Pegoraro <marcos@f10.com.br> escreveu:
I have a logical replication where I want to replicate only one schema.All worked fine, it copied all tables to subscriber, except one. That table has 8GB and it copied 5GB. That copy has been frozen since yesterday, as you can see.
I tried these steps.
- Tried to restart subscriber, didn't solve and file didn't change its size on subscriber.
Subscriber did not try to restart copying from scratch, just did nothing.
DIDN'T SOLVE
- Tried to restart publisher 2 or 3 times, didn't solve the problem but file changed
some MBs but was copying from what starting point if server was restarted and
file was partially transfered ? I don't know. Then on subscriber side I got some messages like
"could not receive data from WAL stream: SSL error: unexpected eof while reading"
DIDN'T SOLVE
- Restarted publisher and while restarting I truncated that table on subscriber.
How srsubstate was "d" but srsublsn was NULL It had to copy that file entirely, and it did.
Then that file was completely copied from publisher and until now everything seems fine.
SOLVED, APPARENTLY
regards
Marcos
On Fri, Dec 5, 2025 at 11:48 AM Marcos Pegoraro <marcos@f10.com.br> wrote:
Em sex., 5 de dez. de 2025 às 09:03, Marcos Pegoraro <marcos@f10.com.br> escreveu:I have a logical replication where I want to replicate only one schema.All worked fine, it copied all tables to subscriber, except one. That table has 8GB and it copied 5GB. That copy has been frozen since yesterday, as you can see.I tried these steps.- Tried to restart subscriber, didn't solve and file didn't change its size on subscriber.Subscriber did not try to restart copying from scratch, just did nothing.DIDN'T SOLVE- Tried to restart publisher 2 or 3 times, didn't solve the problem but file changedsome MBs but was copying from what starting point if server was restarted andfile was partially transfered ? I don't know. Then on subscriber side I got some messages like"could not receive data from WAL stream: SSL error: unexpected eof while reading"DIDN'T SOLVE- Restarted publisher and while restarting I truncated that table on subscriber.How srsubstate was "d" but srsublsn was NULL It had to copy that file entirely, and it did.Then that file was completely copied from publisher and until now everything seems fine.SOLVED, APPARENTLY
Maybe have multiple replication slots, one per X number of customers? More work for you, but less work required by that single-threaded publisher, and more granular: hopefully most slots will replicate.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!