Re: Streaming read-ready sequential scan code

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Streaming read-ready sequential scan code
Дата
Msg-id CAAKRu_arLQjAknvDOa+DQkOPcZRtS257dETFLgrhEwAVdv+WBQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Streaming read-ready sequential scan code  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Streaming read-ready sequential scan code  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On Thu, Apr 4, 2024 at 7:45 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 8:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > 3a4a3537a
> > latency average = 34.497 ms
> > latency average = 34.538 ms
> >
> > 3a4a3537a + read_stream_for_seqscans.patch
> > latency average = 40.923 ms
> > latency average = 41.415 ms
> >
> > i.e. no meaningful change from the refactor, but a regression from a
> > cached workload that changes the page often without doing much work in
> > between with the read stread patch.
>
> I ran Heikki's test except I ran the "insert" 4 times to get a table
> of 4376MB according to \d+.  On my random cloud ARM server (SB=8GB,
> huge pages, parallelism disabled), I see a speedup 1290ms -> 1046ms
> when the data is in LInux cache and PG is not prewarmed, roughly as he
> reported.  Good.
>
> If I pg_prewarm first, I see that slowdown 260ms -> 294ms.  Trying
> things out to see what works, I got that down to 243ms (ie beat
> master) by inserting a memory prefetch:
>
> --- a/src/backend/storage/aio/read_stream.c
> +++ b/src/backend/storage/aio/read_stream.c
> @@ -757,6 +757,8 @@ read_stream_next_buffer(ReadStream *stream, void
> **per_buffer_data)
>         /* Prepare for the next call. */
>         read_stream_look_ahead(stream, false);
>
> +       __builtin_prefetch(BufferGetPage(stream->buffers[stream->oldest_buffer_index]));
>
> Maybe that's a solution to a different problem that just happens to
> more than make up the difference in this case, and it may be
> questionable whether that cache line will survive long enough to help
> you, but this one-tuple-per-page test likes it...  Hmm, to get a more
> realistic table than the one-tuple-per-page on, I tried doubling a
> tenk1 table until it reached 2759MB and then I got a post-prewarm
> regression 702ms -> 721ms, and again I can beat master by memory
> prefetching: 689ms.
>
> Annoyingly, with the same table I see no difference between the actual
> pg_prewarm('giga') time: around 155ms for both.  pg_prewarm is able to
> use the 'fast path' I made pretty much just to be able to minimise
> regression in that (probably stupid) all-cached tes that doesn't even
> look at the page contents.  Unfortunately seq scan can't use it
> because it has per-buffer data, which is one of the things it can't do
> (because of a space management issue).  Maybe I should try to find a
> way to fix that.

So, sequential scan does not have per-buffer data. I did some logging
and the reason most fully-in-SB sequential scans don't use the fast
path is because read_stream->pending_read_nblocks is always 0.

When when read_stream->distance stays 1 (expected for all-in-SB as it
is initialized to 1 and we don't want distance to ramp up),
read_stream_look_ahead() never increments
read_stream->pending_read_nblocks because it sets it to 1 the first
time it is called and then the conditions of the while loop are not
met again

    while (stream->ios_in_progress < stream->max_ios &&
           stream->pinned_buffers + stream->pending_read_nblocks <
stream->distance)

distance is 1, pending_read_nblocks is 1, thus we only loop once and
don't increment pending_read_nblocks.

prewarm is only able to use the fast path because it passes
READ_STREAM_FULL and thus initializes read_stream->distance to a
higher initial value.

I added some logging to see if any of the sequential scans in the
regression suite used the fast path. The one example I see of the fast
path being used is a temp table IO stats test in
src/test/regress/sql/stats.sql. I didn't check exactly what conditions
led it to do this. But we probably want seq scans which are all in SB
to use the fast path.

- Melanie



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Psql meta-command conninfo+
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Add trim_trailing_whitespace to editorconfig file