Reuse data from readRecordBuf in XLogDecodeNextRecord
| От | Sonya Valchuk |
|---|---|
| Тема | Reuse data from readRecordBuf in XLogDecodeNextRecord |
| Дата | |
| Msg-id | CAJLmdKyC3iq-UjYDG0S0rLQfRxfhcctWNcN-i3=3t6ceaUu1oA@mail.gmail.com обсуждение исходный текст |
| Список | pgsql-hackers |
Hi, Our team has previously asked on pgsql-adminpgsql-general about a standby that is never switching to streaming replication while recovering: [1] Our investigation has shown that this happens because often an xlog record falls on a WAL boundary which makes a single XLogDecodeNextRecord call fetch pages from different archives. With prefetching enabled, this causes the following sequence of events: 1. prefetching successfully reads page 1; 2. prefetching fails to read page 2 because the corresponding WAL has not been uploaded to the archive yet, gets XLREAD_WOULDBLOCK; 3. all of the prefetched records are decoded, recovery attempts to read the next record; 4. recovery reads page 1 again, reinvoking restore_command. Because only one WAL is kept open at a time, this causes PostgreSQL to fetch one WAL from the archive twice, which can be a very slow operation if the archive is network-attached; the latency of archive fetches may even be significant enough that recovery never catches up to the primary. Since the only piece of information a restore_command receives is the segment number, it cannot distinguish this situation from the database restarting, so it can't refuse to redownload the WAL either. We use the CloudNativePG operator, which prefetches multiple WALs at a time and makes use of a one-off flag to stop, but the nonmonotonicity of the segment number makes the one-off flag useless. The attached patch fixes this situation by skipping calls to ReadPageInternal if the required data is already present in the record reassembly buffer, reducing the number of I/O operations during recovery and ensuring that restore_command is only executed with monotonically increasing segment numbers during a single recovery run. The patch is for the current master branch, but the nonmonotonicity has been present since at least v15. I don't know if it makes sense to backport the patch, since it's technically merely a performance improvement? I'm not sure on how to regression test this either, but the code passes all existing regression tests and I ran the manual reproduction to confirm that the issue we've observed has been eliminated. [1] https://postgr.es/m/CANOng2i1G_57nvZ4ip4uKKU87jtt%2BfzqWUFV_ou6L8N3bteSXQ%40mail.gmail.com // Sonya Valchuk
Вложения
В списке pgsql-hackers по дате отправления: