Re: BitmapHeapScan streaming read user and prelim refactoring

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: BitmapHeapScan streaming read user and prelim refactoring
Дата
Msg-id 04ff688a-3bd4-4e6b-a1b6-d9e69001daaf@enterprisedb.com
обсуждение исходный текст
Ответ на Re: BitmapHeapScan streaming read user and prelim refactoring  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: BitmapHeapScan streaming read user and prelim refactoring  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers

On 3/13/24 23:38, Thomas Munro wrote:
> On Sun, Mar 3, 2024 at 11:41 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> On 3/2/24 23:28, Melanie Plageman wrote:
>>> On Sat, Mar 2, 2024 at 10:05 AM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>> With the current "master" code, eic=1 means we'll issue a prefetch for B
>>>> and then read+process A. And then issue prefetch for C and read+process
>>>> B, and so on. It's always one page ahead.
>>>
>>> Yes, that is what I mean for eic = 1
> 
> I spent quite a few days thinking about the meaning of eic=0 and eic=1
> for streaming_read.c v7[1], to make it agree with the above and with
> master.  Here's why I was confused:
> 
> Both eic=0 and eic=1 are expected to generate at most 1 physical I/O
> at a time, or I/O queue depth 1 if you want to put it that way.  But
> this isn't just about concurrency of I/O, it's also about computation.
> Duh.
> 
> eic=0 means that the I/O is not concurrent with executor computation.
> So, to annotate an excerpt from [1]'s random.txt, we have:
> 
> effective_io_concurrency = 0, range size = 1
> unpatched                              patched
> ==============================================================================
> pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
>              *** executor now has page at 0x58000 to work on ***
> pread(43,...,8192,0xb0000) = 8192      pread(82,...,8192,0xb0000) = 8192
>              *** executor now has page at 0xb0000 to work on ***
> 
> eic=1 means that a single I/O is started and then control is returned
> to the executor code to do useful work concurrently with the
> background read that we assume is happening:
> 
> effective_io_concurrency = 1, range size = 1
> unpatched                              patched
> ==============================================================================
> pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
> posix_fadvise(43,0xb0000,0x2000,...)   posix_fadvise(82,0xb0000,0x2000,...)
>              *** executor now has page at 0x58000 to work on ***
> pread(43,...,8192,0xb0000) = 8192      pread(82,...,8192,0xb0000) = 8192
> posix_fadvise(43,0x108000,0x2000,...)  posix_fadvise(82,0x108000,0x2000,...)
>              *** executor now has page at 0xb0000 to work on ***
> pread(43,...,8192,0x108000) = 8192     pread(82,...,8192,0x108000) = 8192
> posix_fadvise(43,0x160000,0x2000,...)  posix_fadvise(82,0x160000,0x2000,...)
> 
> In other words, 'concurrency' doesn't mean 'number of I/Os running
> concurrently with each other', it means 'number of I/Os running
> concurrently with computation', and when you put it that way, 0 and 1
> are different.
> 

Interesting. For some reason I thought with eic=1 we'd issue the fadvise
for page #2 before pread of page #1, so that there'd be 2 IO requests in
flight at the same time for a bit of time ... it'd give the fadvise more
time to actually get the data into page cache.

> Note that the first read is a bit special: by the time the consumer is
> ready to pull a buffer out of the stream when we don't have a buffer
> ready yet, it is too late to issue useful advice, so we don't bother.
> FWIW I think even in the AIO future we would have a synchronous read
> in that specific place, at least when using io_method=worker, because
> it would be stupid to ask another process to read a block for us that
> we want right now and then wait for it wake us up when it's done.
> 
> Note that even when we aren't issuing any advice because eic=0 or
> because we detected sequential access and we believe the kernel can do
> a better job than us, we still 'look ahead' (= call the callback to
> see which block numbers are coming down the pipe), but only as far as
> we need to coalesce neighbouring blocks.  (I deliberately avoid using
> the word "prefetch" except in very general discussions because it
> means different things to different layers of the code, hence talk of
> "look ahead" and "advice".)  That's how we get this change:
> 
> effective_io_concurrency = 0, range size = 4
> unpatched                              patched
> ==============================================================================
> pread(43,...,8192,0x58000) = 8192      pread(82,...,8192,0x58000) = 8192
> pread(43,...,8192,0x5a000) = 8192      preadv(82,...,2,0x5a000) = 16384
> pread(43,...,8192,0x5c000) = 8192      pread(82,...,8192,0x5e000) = 8192
> pread(43,...,8192,0x5e000) = 8192      preadv(82,...,4,0xb0000) = 32768
> pread(43,...,8192,0xb0000) = 8192      preadv(82,...,4,0x108000) = 32768
> pread(43,...,8192,0xb2000) = 8192      preadv(82,...,4,0x160000) = 32768
> 
> And then once we introduce eic > 0 to the picture with neighbouring
> blocks that can be coalesced, "patched" starts to diverge even more
> from "unpatched" because it tracks the number of wide I/Os in
> progress, not the number of single blocks.
> 

So, IIUC this means (1) the patched code is more aggressive wrt
prefetching (because we prefetch more data overall, because master would
prefetch N pages and patched prefetches N ranges, each of which may be
multiple pages. And (2) it's not easy to quantify how much more
aggressive it is, because it depends on how we happen to coalesce the
pages into ranges.

Do I understand this correctly?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Reports on obsolete Postgres versions
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Add publisher and subscriber to glossary documentation.