Re: Performance degradation on concurrent COPY into a single relation in PG16.

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Performance degradation on concurrent COPY into a single relation in PG16.
Дата
Msg-id CAApHDvo3ARuEYGk1hMxD+2oOjUFADdxrfWfF17BRWDGugeHqJQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance degradation on concurrent COPY into a single relation in PG16.  (Andres Freund <andres@anarazel.de>)
Ответы Re: Performance degradation on concurrent COPY into a single relation in PG16.  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On Tue, 1 Aug 2023 at 13:25, Andres Freund <andres@anarazel.de> wrote:
> There's a lot of larger numbers in the file, which likely reduces the impact
> some. And there's the overhead of actually inserting the rows into the table,
> making the difference appear smaller than it is.

It might be worth special casing the first digit as we can save doing
the multiplication by 10 and the overflow checks on the first digit.
That should make it slightly faster to parse smaller numbers.

> COPY (SELECT generate_series(1, 1000) a, 10 b, 20 c, 30 d, 40 e, 50 f FROM generate_series(1, 10000)) TO
'/tmp/lotsaints_wide.copy';
>
> psql -c 'DROP TABLE IF EXISTS lotsaints_wide; CREATE UNLOGGED TABLE lotsaints_wide(a int, b int, c int, d int, e int,
fint);' && \
 
>   pgbench -n -P1 -f <( echo "COPY lotsaints_wide FROM '/tmp/lotsaints_wide.copy' WHERE false") -t 5
>
> 15:             2992.605
> HEAD:           3325.201
> fastpath1.patch 2932.606
> fastpath2.patch 2783.915
>
> Interestingly fastpath1 is slower now, even though it wasn't with earlier
> patches (which still is repeatable). I do not have the foggiest as to why.

I'm glad to see that.

I've adjusted the patch to add the fast path for the 16 and 64-bit
versions of the function.  I also added the special case for
processing the first digit, which looks like:

/* process the first digit */
digit = (*ptr - '0');

if (likely(digit < 10))
{
    ptr++;
    tmp = digit;
}

/* process remaining digits */
for (;;)

I tried adding the "at least 1 digit check" by adding an else { goto
slow; } in the above code, but it seems to generate slower code than
just checking if (unlikely(ptr == s)) { goto slow; } after the loop.

I also noticed that I wasn't getting the same performance after
adjusting the 16 and 64 bit versions. I assume that's down to code
alignment, but unsure of that.  I ended up adjusting all the "while
(*ptr)" loops into "for (;;)" loops
since the NUL char check is handled by the "else break;".  I also
removed the needless NUL char check in the isspace loops. It can't be
isspace and '\0'. I also replaced the isdigit() function call and
replaced it for manually checking the digit range.  I see my compiler
(gcc12.2) effectively generates the same code as the unsigned char
fast path version checking if (digit < 10). Once I did that, I got the
performance back again.

With your new test with the small-sized ints, I get:

REL_15_STABLE:
latency average = 1696.390 ms

master @ d3a38318a
latency average = 1928.803 ms

master + fastpath1.patch:
latency average = 1634.736 ms

master + fastpath2.patch:
latency average = 1628.704 ms

master + fastpath3.patch
latency average = 1590.417 ms

I see no particular reason not to go ahead with the attached patch and
get this thread closed off. Any objections?

David

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Performance degradation on concurrent COPY into a single relation in PG16.
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Incorrect handling of OOM in WAL replay leading to data loss