Re: [PATCH] Full support for index LP_DEAD hint bits on standby
От | Michail Nikolaev |
---|---|
Тема | Re: [PATCH] Full support for index LP_DEAD hint bits on standby |
Дата | |
Msg-id | CANtu0ohHu1r1xQfTzEJuxeaOMYncG7xRxUQWdH=cMXZSf+nzvg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PATCH] Full support for index LP_DEAD hint bits on standby (Michail Nikolaev <michail.nikolaev@gmail.com>) |
Ответы |
Re: [PATCH] Full support for index LP_DEAD hint bits on standby
|
Список | pgsql-hackers |
Hello, everyone. After some correspondence with Peter Geoghegan (1) and his ideas, I have reworked the patch a lot and now it is much more simple with even better performance (no new WAL or conflict resolution, hot standby feedback is unrelated). The idea is pretty simple now - let’s mark the page with “standby-safe” LP_DEAD hints by the bit in btpo_flags (BTP_LP_SAFE_ON_STANDBY and similar for gist and hash). If standby wants to set LP_DEAD - it checks BTP_LP_SAFE_ON_STANDBY on the page first, if it is not set - all “primary” hints are removed first, and then the flag is set (with memory barrier to avoid memory ordering issues in concurrent scans). Also, standby checks BTP_LP_SAFE_ON_STANDBY to be sure about ignoring tuples marked by LP_DEAD during the scan. Of course, it is not so easy. If standby was promoted (or primary was restored from standby backup) - it is still possible to receive FPI with such flag set in WAL logs. So, the main problem is still there. But we could just clear this flag while applying FPI because the page remains dirty after that anyway! It should not cause any checksum, consistency, or pg_rewind issues as explained in (2). Semantically it is the same as set hint bit one milisecond after FPI was applied (while page still remains dirty after FPI replay) - and standby already does it with *heap* hint bits. Also, TAP-test attached to (2) shows how it is easy to flush a hint bit which was set by standby to achieve different checksum comparing to primary already. If standby was promoted (or restored from standby backup) it is safe to use LP_DEAD with or without BTP_LP_SAFE_ON_STANDBY on a page. But for accuracy BTP_LP_SAFE_ON_STANDBY is cleared by primary if found. Also, we should take into account minRecoveryPoint as described in (3) to avoid consistency issues during crash recovery (see IsIndexLpDeadAllowed). Also, as far as I know - there is no practical sense to keep minRecoveryPoint at a low value. So, there is an optional patch that moves minRecoveryPoint forward at each xl_running_data (to allow standby to set hint bits and LP_DEADs more aggressively). It is about every 15s. There are some graphics showing performance testing results on my PC in the attachment (test is taken from (4)). Each test was running for 10 minutes. Additional primary performance is probably just measurement error. But standby performance gain is huge. Feel free to ask if you need more proof about correctness. Thanks, Michail. [1] - https://www.postgresql.org/message-id/flat/CAH2-Wz%3D-BoaKgkN-MnKj6hFwO1BOJSA%2ByLMMO%2BLRZK932fNUXA%40mail.gmail.com#6d7cdebd68069cc493c11b9732fd2040 [2] - https://www.postgresql.org/message-id/flat/CANtu0oiAtteJ%2BMpPonBg6WfEsJCKrxuLK15P6GsaGDcYGjefVQ%40mail.gmail.com#091fca433185504f2818d5364819f7a4 [3] - https://www.postgresql.org/message-id/flat/CANtu0oh28mX5gy5jburH%2Bn1mcczK5_dCQnhbBnCM%3DPfqh-A26Q%40mail.gmail.com#ecfe5a331a3058f895c0cba698fbc4d3 [4] - https://www.postgresql.org/message-id/flat/CANtu0oiP18H31dSaEzn0B0rW6tA_q1G7%3D9Y92%2BUS_WHGOoQevg%40mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: