Add Information during standby recovery conflicts
От | Drouvot, Bertrand |
---|---|
Тема | Add Information during standby recovery conflicts |
Дата | |
Msg-id | 9a60178c-a853-1440-2cdc-c3af916cff59@amazon.com обсуждение исходный текст |
Ответы |
Re: Add Information during standby recovery conflicts
|
Список | pgsql-hackers |
Hi hackers,
I've attached a patch to add information during standby recovery conflicts.
The motive behind is to be able to get some information:
- On the apply side
- On the blocker(s) side
When a standby recovery conflict occurs it could be useful to get more information to be able to dive deep on the root cause and find a way to avoid/mitigate new occurrences.
Adding this information would make the investigations easier, it could help answering questions like:
- On which LSN was the WAL apply blocked?
- What was the purpose of the bocked WAL record?
- On which relation (if any) was the blocked WAL record related to?
- What was the blocker(s) doing?
- When did the blocker(s) started their queries (if any)?
- What was the blocker(s) waiting for? on which wait event?
There is 2 points in this patch:
- Add the information about the blocked WAL record. This is done in standby.c (ResolveRecoveryConflictWithVirtualXIDs, ResolveRecoveryConflictWithDatabase, StandbyTimeoutHandler)
- Add the information about the blocker(s). This is done in postgres.c (RecoveryConflictInterrupt)
2020-06-15 06:48:23.774 UTC [6971] LOG: wal record apply is blocked by 2 connection(s), reason: User query might have needed to see row versions that must be removed. 2020-06-15 06:48:23.774 UTC [6971] LOG: blocked wal record rmgr: Heap2, lsn: 0/038E2678, received at: 2020-06-15 06:48:23.774098+00, desc: CLEAN, relation: rel 1663/13586/16652 fork main blk 0 2020-06-15 06:48:54.773 UTC [7088] LOG: about to interrupt pid: 7088, backend_type: client backend, state: active, wait_event_type: Timeout, wait_event: PgSleep, query_start: 2020-06-15 06:48:14.87672+00 2020-06-15 06:48:54.773 UTC [7088] DETAIL: statement: select *, pg_sleep(120) from bdt; 2020-06-15 06:48:54.773 UTC [7088] STATEMENT: select *, pg_sleep(120) from bdt; 2020-06-15 06:48:54.773 UTC [7088] ERROR: canceling statement due to conflict with recovery 2020-06-15 06:48:54.773 UTC [7088] DETAIL: User query might have needed to see row versions that must be removed. 2020-06-15 06:48:54.773 UTC [7088] STATEMENT: select *, pg_sleep(120) from bdt; 2020-06-15 06:48:54.778 UTC [7037] LOG: about to interrupt pid: 7037, backend_type: client backend, state: active, wait_event_type: Timeout, wait_event: PgSleep, query_start: 2020-06-15 06:48:13.008427+00 2020-06-15 06:48:54.778 UTC [7037] DETAIL: statement: select *, pg_sleep(300) from bdt; 2020-06-15 06:48:54.778 UTC [7037] STATEMENT: select *, pg_sleep(300) from bdt; 2020-06-15 06:48:54.778 UTC [7037] ERROR: canceling statement due to conflict with recovery 2020-06-15 06:48:54.778 UTC [7037] DETAIL: User query might have needed to see row versions that must be removed. 2020-06-15 06:48:54.778 UTC [7037] STATEMENT: select *, pg_sleep(300) from bdt;
It takes care of the other conflicts reason too.
So, should a standby recovery conflict occurs, we could see:
- information about the blocked WAL record apply
- information about the blocker(s)
Regards,
Bertrand
Вложения
В списке pgsql-hackers по дате отправления: