Re: BitmapHeapScan streaming read user and prelim refactoring

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BitmapHeapScan streaming read user and prelim refactoring
Дата
Msg-id CA+hUKGLvKtf6sGo-YbW8cOf+SP6w90202Ut2ZVz10V+Fj0+KTw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BitmapHeapScan streaming read user and prelim refactoring  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Fri, Mar 15, 2024 at 3:18 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> So, IIUC this means (1) the patched code is more aggressive wrt
> prefetching (because we prefetch more data overall, because master would
> prefetch N pages and patched prefetches N ranges, each of which may be
> multiple pages. And (2) it's not easy to quantify how much more
> aggressive it is, because it depends on how we happen to coalesce the
> pages into ranges.
>
> Do I understand this correctly?

Yes.

Parallelism must prevent coalescing here though.  Any parallel aware
executor node that allocates block numbers to workers without trying
to preserve ranges will.  That not only hides the opportunity to
coalesce reads, it also makes (globally) sequential scans look random
(ie locally they are more random), so that our logic to avoid issuing
advice for sequential scan won't work, and we'll inject extra useless
or harmful (?) fadvise calls.  I don't know what to do about that yet,
but it seems like a subject for future research.  Should we recognise
sequential scans with a window (like Linux does), instead of strictly
next-block detection (like some other OSes do)?  Maybe a shared
streaming read that all workers pull blocks from, so it can see what's
going on?  I think the latter would be strictly more like what the ad
hoc BHS prefetching code in master is doing, but I don't know if it'd
be over-engineering, or hard to do for some reason.

Another aspect of per-backend streaming reads in one parallel query
that don't know about each other is that they will all have their own
effective_io_concurrency limit.  That is a version of a problem that
comes up again and again in parallel query, to be solved by the grand
unified resource control system of the future.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Recent 027_streaming_regress.pl hangs
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE