Re: archive status ".ready" files may be created too early
От | Bossart, Nathan |
---|---|
Тема | Re: archive status ".ready" files may be created too early |
Дата | |
Msg-id | EFF40306-8E8A-4259-B181-C84F3F06636C@amazon.com обсуждение исходный текст |
Ответ на | Re: archive status ".ready" files may be created too early (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Ответы |
Re: archive status ".ready" files may be created too early
|
Список | pgsql-hackers |
Apologies for the long delay. I've spent a good amount of time thinking about this bug and trying out a few different approaches for fixing it. I've attached a work- in-progress patch for my latest attempt. On 10/13/20, 5:07 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote: > F0 F1 > AAAAA F BBBBB > |---------|---------|---------| > seg X seg X+1 seg X+2 > > Matsumura-san has a concern about the case where there are two (or > more) partially-flushed segment-spanning records at the same time. > > This patch remembers only the last cross-segment record. If we were > going to flush up to F0 after Record-B had been written, we would fail > to hold-off archiving seg-X. This patch is based on a assumption that > that case cannot happen because we don't leave a pending page at the > time of segment switch and no records don't span over three or more > segments. I wonder if these are safe assumptions to make. For your example, if we've written record B to the WAL buffers, but neither record A nor B have been written to disk or flushed, aren't we still in trouble? Also, is there actually any limit on WAL record length that means that it is impossible for a record to span over three or more segments? Perhaps these assumptions are true, but it doesn't seem obvious to me that they are, and they might be pretty fragile. The attached patch doesn't make use of these assumptions. Instead, we track the positions of the records that cross segment boundaries in a small hash map, and we use that to determine when it is safe to mark a segment as ready for archival. I think this approach resembles Matsumura-san's patch from June. As before, I'm not handling replication, archive_timeout, and persisting latest-marked-ready through crashes yet. For persisting the latest-marked-ready segment through crashes, I was thinking of using a new file that stores the segment number. Nathan
Вложения
В списке pgsql-hackers по дате отправления: