Re: BUG #19006: Assert(BufferIsPinned) in BufferGetBlockNumber() is triggered for forwarded buffer
От | Xuneng Zhou |
---|---|
Тема | Re: BUG #19006: Assert(BufferIsPinned) in BufferGetBlockNumber() is triggered for forwarded buffer |
Дата | |
Msg-id | CABPTF7XMsoEUfQ-jn96O7hqKG5C+_1rBRwisRYyDidicOaz-Bw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #19006: Assert(BufferIsPinned) in BufferGetBlockNumber() is triggered for forwarded buffer (Thomas Munro <thomas.munro@gmail.com>) |
Список | pgsql-bugs |
Hi, On Wed, Aug 13, 2025 at 2:02 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Aug 13, 2025 at 5:29 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > 1. The leading block is BM_VALID, so we choose to give it to you > > > immediately and not look further (we could look further and return > > > more than one consecutive BM_VALID block at a time, but this isn't > > > implemented) > > > > I am curious about why this isn't implemented. It looks helpful. > > Is there any blocking issue or trade-off for not doing so? > > stream->distance tends to be 1 for well cached data (it grows when IO > is needed and it is worth looking far ahead, and shrinks when data is > found in cache). So we won't currently have a distance > 1 for well > cached data, so we wouldn't benefit much. That may not always be > true: if the buffer mapping table is changed to a tree data structure > (like the way kernels manage virtual memory pages associated with a > file or other VM object), then we might have an incentive to look up > lots of cached blocks at the same time, and then we might want to > change the distance tuning algorithm, and then we might want > StartReadBuffers() to be able to return more than one cached block at > a time. We've focused mainly on I/O so far, and just tried not to > make the well cached cases worse. Understood. I came across Alexander’s article on the topic. It appears to require a major re-architecture, which could be challenging to land HEAD. https://www.orioledb.com/blog/buffer-management > > > The format of this part is not aligned well in gmail, so I copy it into vs code. > > Is this layout right? > > Yes. > > > I found second illustration somewhat hard to follow, especially > > the "do nothing" trick and the movement of next_buffef_index in the second queue. > > Maybe I need to read the corresponding code. > > Suppose you have pending_read_blocknum = 100, pending_read_nblock = 5, > next_buffer_index = 200. Now you decide to start that read, because > the next block the caller wants can't be combined, so you call > StartReadBuffers(blocknum = 100, *nblocks = 4, buffers = > &buffers[200]). StartReadBuffers() pins 5 buffers and starts an IO, > but finds a reason to split it at size 2. It sets *nblocks to 2, and > returns. Now read_stream.c adds 2 to pending_read_blocknum, subtracts > 2 from pending_read_nblocks, so that it represents the continuation of > the previous read. It also adds 2 to next_buffer_index with modulo > arithmetic (so it wraps around the queue), because that is the > location of the buffers for the next read, and sets forwarded_buffers > to 3, because there are 3 buffers already pinned (in that patch ^ it > is renamed to pending_read_npinned). The three buffers themselves are > already in the queue at the right position for StartReadBuffers() to > receive them and know that it doesn't have to pin them again. What I > was referring to with "doing nothing" is the way that if you keep > sliding StartReadBuffers() call along the queue, the extra buffers it > spits out become its input next time, without having to be copied > anywhere. In that picture there are two forwarded buffers, labeled 9 > and A. Got it. Elegant technique. Patch 2 also reads as a robustness improvement now. :-) I'll try to map the high-level design to code and review patch 2. Thanks for the great guidance! Best, Xuneng
В списке pgsql-bugs по дате отправления: