Re: BitmapHeapScan streaming read user and prelim refactoring

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BitmapHeapScan streaming read user and prelim refactoring
Дата
Msg-id CA+hUKGJtm_gkmW_h_02-Q9ZRcG3yOx2uzVqbCTfz7YPnTfs+DA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BitmapHeapScan streaming read user and prelim refactoring  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: BitmapHeapScan streaming read user and prelim refactoring  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Fri, Mar 29, 2024 at 10:43 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I think there's some sort of bug, triggering this assert in heapam
>
>   Assert(BufferGetBlockNumber(hscan->rs_cbuf) == tbmres->blockno);

Thanks for the repro.  I can't seem to reproduce it (still trying) but
I assume this is with Melanie's v11 patch set which had
v11-0016-v10-Read-Stream-API.patch.

Would you mind removing that commit and instead applying the v13
stream_read.c patches[1]?  v10 stream_read.c was a little confused
about random I/O combining, which I fixed with a small adjustment to
the conditions for the "if" statement right at the end of
read_stream_look_ahead().  Sorry about that.  The fixed version, with
eic=4, with your test query using WHERE a < a, ends its scan with:

...
posix_fadvise(32,0x28aee000,0x4000,POSIX_FADV_WILLNEED) = 0 (0x0)
pread(32,"\0\0\0\0@4\M-5:\0\0\^D\0\M-x\^A"...,40960,0x28acc000) = 40960 (0xa000)
posix_fadvise(32,0x28af4000,0x4000,POSIX_FADV_WILLNEED) = 0 (0x0)
pread(32,"\0\0\0\0\^XC\M-6:\0\0\^D\0\M-x"...,32768,0x28ad8000) = 32768 (0x8000)
posix_fadvise(32,0x28afc000,0x4000,POSIX_FADV_WILLNEED) = 0 (0x0)
pread(32,"\0\0\0\0\M-XQ\M-7:\0\0\^D\0\M-x"...,24576,0x28ae4000) = 24576 (0x6000)
posix_fadvise(32,0x28b02000,0x8000,POSIX_FADV_WILLNEED) = 0 (0x0)
pread(32,"\0\0\0\0\M^@3\M-8:\0\0\^D\0\M-x"...,16384,0x28aee000) = 16384 (0x4000)
pread(32,"\0\0\0\0\M-`\M-:\M-8:\0\0\^D\0"...,16384,0x28af4000) = 16384 (0x4000)
pread(32,"\0\0\0\0po\M-9:\0\0\^D\0\M-x\^A"...,16384,0x28afc000) = 16384 (0x4000)
pread(32,"\0\0\0\0\M-P\M-v\M-9:\0\0\^D\0"...,32768,0x28b02000) = 32768 (0x8000)

In other words it's able to coalesce, but v10 was a bit b0rked in that
respect and wouldn't do as well at that.  Then if you set
io_combine_limit = 1, it looks more like master, eg lots of little
reads, but not as many fadvises as master because of sequential
access:

...
posix_fadvise(32,0x28af4000,0x2000,POSIX_FADV_WILLNEED) = 0 (0x0) -+
pread(32,...,8192,0x28ae8000) = 8192 (0x2000)                      |
pread(32,...,8192,0x28aee000) = 8192 (0x2000)                      |
posix_fadvise(32,0x28afc000,0x2000,POSIX_FADV_WILLNEED) = 0 (0x0) ---+
pread(32,...,8192,0x28af0000) = 8192 (0x2000)                      | |
pread(32,...,8192,0x28af4000) = 8192 (0x2000) <--------------------+ |
posix_fadvise(32,0x28b02000,0x2000,POSIX_FADV_WILLNEED) = 0 (0x0) -----+
pread(32,...,8192,0x28af6000) = 8192 (0x2000)                        | |
pread(32,...,8192,0x28afc000) = 8192 (0x2000) <----------------------+ |
pread(32,...,8192,0x28afe000) = 8192 (0x2000)  }-- no advice           |
pread(32,...,8192,0x28b02000) = 8192 (0x2000) <------------------------+
pread(32,...,8192,0x28b04000) = 8192 (0x2000)  }
pread(32,...,8192,0x28b06000) = 8192 (0x2000)  }-- no advice
pread(32,...,8192,0x28b08000) = 8192 (0x2000)  }

It becomes slightly less eager to start I/Os as soon as
io_combine_limit > 1, because when it has hit max_ios, if ... <thinks>
yeah if the average block that it can combine is bigger than 4, an
arbitrary number from:

    max_pinned_buffers = Max(max_ios * 4, io_combine_limit);

.... then it can run out of look ahead window before it can reach
max_ios (aka eic), so that's a kind of arbitrary/bogus I/O depth
constraint, which is another way of saying what I was saying earlier:
maybe it just needs more distance.  So let's see the average combined
I/O length in your test query... for me it works out to 27,169 bytes.
But I think there must be times when it runs out of window due to
clustering.  So you could also try increasing that 4->8 to see what
happens to performance.

[1] https://www.postgresql.org/message-id/CA%2BhUKG%2B5UofvseJWv6YqKmuc_%3Drguc7VqKcNEG1eawKh3MzHXQ%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there