Re: BitmapHeapScan streaming read user and prelim refactoring
От | Thomas Munro |
---|---|
Тема | Re: BitmapHeapScan streaming read user and prelim refactoring |
Дата | |
Msg-id | CA+hUKGJtm_gkmW_h_02-Q9ZRcG3yOx2uzVqbCTfz7YPnTfs+DA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BitmapHeapScan streaming read user and prelim refactoring (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: BitmapHeapScan streaming read user and prelim refactoring
(Tomas Vondra <tomas.vondra@enterprisedb.com>)
|
Список | pgsql-hackers |
On Fri, Mar 29, 2024 at 10:43 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I think there's some sort of bug, triggering this assert in heapam > > Assert(BufferGetBlockNumber(hscan->rs_cbuf) == tbmres->blockno); Thanks for the repro. I can't seem to reproduce it (still trying) but I assume this is with Melanie's v11 patch set which had v11-0016-v10-Read-Stream-API.patch. Would you mind removing that commit and instead applying the v13 stream_read.c patches[1]? v10 stream_read.c was a little confused about random I/O combining, which I fixed with a small adjustment to the conditions for the "if" statement right at the end of read_stream_look_ahead(). Sorry about that. The fixed version, with eic=4, with your test query using WHERE a < a, ends its scan with: ... posix_fadvise(32,0x28aee000,0x4000,POSIX_FADV_WILLNEED) = 0 (0x0) pread(32,"\0\0\0\0@4\M-5:\0\0\^D\0\M-x\^A"...,40960,0x28acc000) = 40960 (0xa000) posix_fadvise(32,0x28af4000,0x4000,POSIX_FADV_WILLNEED) = 0 (0x0) pread(32,"\0\0\0\0\^XC\M-6:\0\0\^D\0\M-x"...,32768,0x28ad8000) = 32768 (0x8000) posix_fadvise(32,0x28afc000,0x4000,POSIX_FADV_WILLNEED) = 0 (0x0) pread(32,"\0\0\0\0\M-XQ\M-7:\0\0\^D\0\M-x"...,24576,0x28ae4000) = 24576 (0x6000) posix_fadvise(32,0x28b02000,0x8000,POSIX_FADV_WILLNEED) = 0 (0x0) pread(32,"\0\0\0\0\M^@3\M-8:\0\0\^D\0\M-x"...,16384,0x28aee000) = 16384 (0x4000) pread(32,"\0\0\0\0\M-`\M-:\M-8:\0\0\^D\0"...,16384,0x28af4000) = 16384 (0x4000) pread(32,"\0\0\0\0po\M-9:\0\0\^D\0\M-x\^A"...,16384,0x28afc000) = 16384 (0x4000) pread(32,"\0\0\0\0\M-P\M-v\M-9:\0\0\^D\0"...,32768,0x28b02000) = 32768 (0x8000) In other words it's able to coalesce, but v10 was a bit b0rked in that respect and wouldn't do as well at that. Then if you set io_combine_limit = 1, it looks more like master, eg lots of little reads, but not as many fadvises as master because of sequential access: ... posix_fadvise(32,0x28af4000,0x2000,POSIX_FADV_WILLNEED) = 0 (0x0) -+ pread(32,...,8192,0x28ae8000) = 8192 (0x2000) | pread(32,...,8192,0x28aee000) = 8192 (0x2000) | posix_fadvise(32,0x28afc000,0x2000,POSIX_FADV_WILLNEED) = 0 (0x0) ---+ pread(32,...,8192,0x28af0000) = 8192 (0x2000) | | pread(32,...,8192,0x28af4000) = 8192 (0x2000) <--------------------+ | posix_fadvise(32,0x28b02000,0x2000,POSIX_FADV_WILLNEED) = 0 (0x0) -----+ pread(32,...,8192,0x28af6000) = 8192 (0x2000) | | pread(32,...,8192,0x28afc000) = 8192 (0x2000) <----------------------+ | pread(32,...,8192,0x28afe000) = 8192 (0x2000) }-- no advice | pread(32,...,8192,0x28b02000) = 8192 (0x2000) <------------------------+ pread(32,...,8192,0x28b04000) = 8192 (0x2000) } pread(32,...,8192,0x28b06000) = 8192 (0x2000) }-- no advice pread(32,...,8192,0x28b08000) = 8192 (0x2000) } It becomes slightly less eager to start I/Os as soon as io_combine_limit > 1, because when it has hit max_ios, if ... <thinks> yeah if the average block that it can combine is bigger than 4, an arbitrary number from: max_pinned_buffers = Max(max_ios * 4, io_combine_limit); .... then it can run out of look ahead window before it can reach max_ios (aka eic), so that's a kind of arbitrary/bogus I/O depth constraint, which is another way of saying what I was saying earlier: maybe it just needs more distance. So let's see the average combined I/O length in your test query... for me it works out to 27,169 bytes. But I think there must be times when it runs out of window due to clustering. So you could also try increasing that 4->8 to see what happens to performance. [1] https://www.postgresql.org/message-id/CA%2BhUKG%2B5UofvseJWv6YqKmuc_%3Drguc7VqKcNEG1eawKh3MzHXQ%40mail.gmail.com
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Zhijie Hou (Fujitsu)"Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: "David G. Johnston"Дата:
Сообщение: Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there