Re: BitmapHeapScan streaming read user and prelim refactoring

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: BitmapHeapScan streaming read user and prelim refactoring
Дата	13 марта 2024 г. 22:38:38
Msg-id	CA+hUKG+a1NSHa-=7znx1EhmGXo+BFJH3mk3xJJLY3SPgJ0L2Bw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: BitmapHeapScan streaming read user and prelim refactoring (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: BitmapHeapScan streaming read user and prelim refactoring
Список	pgsql-hackers

Дерево обсуждения

On Sun, Mar 3, 2024 at 11:41 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 3/2/24 23:28, Melanie Plageman wrote:
> > On Sat, Mar 2, 2024 at 10:05 AM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >> With the current "master" code, eic=1 means we'll issue a prefetch for B
> >> and then read+process A. And then issue prefetch for C and read+process
> >> B, and so on. It's always one page ahead.
> >
> > Yes, that is what I mean for eic = 1

I spent quite a few days thinking about the meaning of eic=0 and eic=1
for streaming_read.c v7[1], to make it agree with the above and with
master.  Here's why I was confused:

Both eic=0 and eic=1 are expected to generate at most 1 physical I/O
at a time, or I/O queue depth 1 if you want to put it that way.  But
this isn't just about concurrency of I/O, it's also about computation.
Duh.

eic=0 means that the I/O is not concurrent with executor computation.
So, to annotate an excerpt from [1]'s random.txt, we have:

effective_io_concurrency = 0, range size = 1
unpatched                              patched
==============================================================================
pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
             *** executor now has page at 0x58000 to work on ***
pread(43,...,8192,0xb0000) = 8192      pread(82,...,8192,0xb0000) = 8192
             *** executor now has page at 0xb0000 to work on ***

eic=1 means that a single I/O is started and then control is returned
to the executor code to do useful work concurrently with the
background read that we assume is happening:

effective_io_concurrency = 1, range size = 1
unpatched                              patched
==============================================================================
pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
posix_fadvise(43,0xb0000,0x2000,...)   posix_fadvise(82,0xb0000,0x2000,...)
             *** executor now has page at 0x58000 to work on ***
pread(43,...,8192,0xb0000) = 8192      pread(82,...,8192,0xb0000) = 8192
posix_fadvise(43,0x108000,0x2000,...)  posix_fadvise(82,0x108000,0x2000,...)
             *** executor now has page at 0xb0000 to work on ***
pread(43,...,8192,0x108000) = 8192     pread(82,...,8192,0x108000) = 8192
posix_fadvise(43,0x160000,0x2000,...)  posix_fadvise(82,0x160000,0x2000,...)

In other words, 'concurrency' doesn't mean 'number of I/Os running
concurrently with each other', it means 'number of I/Os running
concurrently with computation', and when you put it that way, 0 and 1
are different.

Note that the first read is a bit special: by the time the consumer is
ready to pull a buffer out of the stream when we don't have a buffer
ready yet, it is too late to issue useful advice, so we don't bother.
FWIW I think even in the AIO future we would have a synchronous read
in that specific place, at least when using io_method=worker, because
it would be stupid to ask another process to read a block for us that
we want right now and then wait for it wake us up when it's done.

Note that even when we aren't issuing any advice because eic=0 or
because we detected sequential access and we believe the kernel can do
a better job than us, we still 'look ahead' (= call the callback to
see which block numbers are coming down the pipe), but only as far as
we need to coalesce neighbouring blocks.  (I deliberately avoid using
the word "prefetch" except in very general discussions because it
means different things to different layers of the code, hence talk of
"look ahead" and "advice".)  That's how we get this change:

effective_io_concurrency = 0, range size = 4
unpatched                              patched
==============================================================================
pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
pread(43,...,8192,0x5a000) = 8192      preadv(82,...,2,0x5a000) = 16384
pread(43,...,8192,0x5c000) = 8192      pread(82,...,8192,0x5e000) = 8192
pread(43,...,8192,0x5e000) = 8192      preadv(82,...,4,0xb0000) = 32768
pread(43,...,8192,0xb0000) = 8192      preadv(82,...,4,0x108000) = 32768
pread(43,...,8192,0xb2000) = 8192      preadv(82,...,4,0x160000) = 32768

And then once we introduce eic > 0 to the picture with neighbouring
blocks that can be coalesced, "patched" starts to diverge even more
from "unpatched" because it tracks the number of wide I/Os in
progress, not the number of single blocks.

[1] https://www.postgresql.org/message-id/CA+hUKGLJi+c5jB3j6UvkgMYHky-qu+LPCsiNahUGSa5Z4DvyVA@mail.gmail.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BitmapHeapScan streaming read user and prelim refactoring