BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached
Дата
Msg-id 17327-89d0efa8b9ae6271@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17327
Logged by:          Alex E
Email address:      alex@altmetric.com
PostgreSQL version: 13.5
Operating system:   Ubuntu 18.04
Description:

We have recently run into a situation where our pg_basebackup-based backups
started failing unexpectedly. These use WAL streaming to keep up with
changes (which uses a temporary replication slot server side). 

The only errors logged on the client side were as listed below:

pg_basebackup: error: could not receive data from WAL stream: SSL connection
has been closed unexpectedly
pg_basebackup: error: could not read COPY data: server closed the connection
unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
pg_basebackup: removing contents of data directory "/backups/some/path/"

whilst on the server side we only got:

2021-12-03 16:21:54 UTC [29724-2647] LOG:  terminating process 42601 to
release replication slot "pg_basebackup_42601"
2021-12-03 16:21:54 UTC [42601-1] replicator@[unknown] FATAL:  terminating
connection due to administrator command
2021-12-03 16:21:54 UTC [42601-2] replicator@[unknown] STATEMENT:
START_REPLICATION SLOT "pg_basebackup_42601" 4721F/45000000 TIMELINE 3

The above was very unhelpful as it made us believe we might be dealing with
either a network interruption or another type of mysterious hardware
error.

We then proceeded to try several things to try and determine the root cause
of the problem and eventually realized (by trial and error and monitoring
various statistics) that we were breaching our max_slot_wal_keep_size limit
for the temporary replication slot whilst taking the pg_basebackup. The only
way we realized this was by using a permanent physical replication slot to
take the backup instead of a temporary one, and when doing this a relevant
error related to max_slot_wal_keep_size being breached was issued.

The core issue here then in our opinion is that Postgres server should log
an error when the max_slot_wal_keep_size limit is reached for temporary
replication slots as well as for permanent ones as otherwise
users/administrators are presented only with non-descript connection
termination errors which do not point to the actual cause of the problem.


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Debabrata Pan
Дата:
Сообщение: unable to start pg agent 12 service on windows 10
Следующее
От: Greg Rychlewski
Дата:
Сообщение: Re: BUG #17325: Unexpected streaming replication protocol bytes for IDENTIFY_SYSTEM command