Re: [RFC] Lock-free XLog Reservation from WAL
От | Yura Sokolov |
---|---|
Тема | Re: [RFC] Lock-free XLog Reservation from WAL |
Дата | |
Msg-id | 7b31f916-2b7d-49c7-b70a-b0342ba6b423@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: [RFC] Lock-free XLog Reservation from WAL (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Список | pgsql-hackers |
10.01.2025 19:53, Matthias van de Meent пишет: > On Fri, 10 Jan 2025 at 13:42, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: >> >> BTW, your version could make alike trick for guaranteed atomicity: >> - change XLogRecord's `XLogRecPtr xl_prev` to `uint32 xl_prev_offset` >> and store offset to prev record's start. > > -1, I don't think that is possible without degrading what our current > WAL system protects against. > > For intra-record torn write protection we have the checksum, but that > same protection doesn't cover the multiple WAL records on each page. > That is what the xl_prev pointer is used for - detecting that this > part of the page doesn't contain the correct data (e.g. the data of a > previous version of this recycled segment). > If we replaced xl_prev with just an offset into the segment, then this > protection would be much less effective, as the previous version of > the segment realistically used the same segment offsets at the same > offsets into the file. Well, to protect against "torn write" it is enough to have "self-lsn" field, not "prev-lsn". So 8 byte "self-lsn" + "offset-to-prev" would work. But this way header will be increased by 4 bytes compared to current one, not decreased. Just thought: If XLogRecord alignment were stricter (for example, 32 bytes), then LSN could mean not byte-offset, but 32byte-offset. Then low 32bits of LSN will cover 128GB of WAL logs. For most installations re-use distance for WAL segments doubdfully longer than 128GB. But I believe, there are some with larger one. So it is not reliable. > To protect against torn writes while still only using record segment > offsets, you'd have zero and then fsync any segment before reusing it, > which would severely reduce the benefits we get from recycling > segments. > Note that we can't expect the page header to help here, as write tears > can happen at nearly any offset into the page - not just 8k intervals > - and so the page header is not always representative of the origins > of all bytes on the page - only the first 24 (if even that). ----- regards, Yura
В списке pgsql-hackers по дате отправления: