Re: pg15b3: recovery fails with wal prefetch enabled
| От | Thomas Munro |
|---|---|
| Тема | Re: pg15b3: recovery fails with wal prefetch enabled |
| Дата | |
| Msg-id | CA+hUKGL=+0nF8o8xG5DDUepG0ZxgDXusF=Jqtd7FmtFvmR1Gmg@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: pg15b3: recovery fails with wal prefetch enabled (Thomas Munro <thomas.munro@gmail.com>) |
| Ответы |
Re: pg15b3: recovery fails with wal prefetch enabled
Re: pg15b3: recovery fails with wal prefetch enabled |
| Список | pgsql-hackers |
On Fri, Sep 2, 2022 at 6:20 PM Thomas Munro <thomas.munro@gmail.com> wrote: > ... The active ingredient here is a setting of > maintenance_io_concurency=0, which runs into a dumb accounting problem > of the fencepost variety and incorrectly concludes it's reached the > end early. Setting it to 3 or higher allows his system to complete > recovery. I'm working on a fix ASAP. The short version is that when tracking the number of IOs in progress, I had two steps in the wrong order in the algorithm for figuring out whether IO is saturated. Internally, the effect of maintenance_io_concurrency is clamped to 2 or more, and that mostly hides the bug until you try to replay a particular sequence like Justin's with such a low setting. Without that clamp, and if you set it to 1, then several of our recovery tests fail. That clamp was a bad idea. What I think we really want is for maintenance_io_concurrency=0 to disable recovery prefetching exactly as if you'd set recovery_prefetch=off, and any other setting including 1 to work without clamping. Here's the patch I'm currently testing. It also fixes a related dangling reference problem with very small maintenance_io_concurrency. I had this more or less figured out on Friday when I wrote last, but I got stuck on a weird problem with 026_overwrite_contrecord.pl. I think that failure case should report an error, no? I find it strange that we end recovery in silence. That was a problem for the new coding in this patch, because it is confused by XLREAD_FAIL without queuing an error, and then retries, which clobbers the aborted recptr state. I'm still looking into that.
Вложения
В списке pgsql-hackers по дате отправления: