Re: More speedups for tuple deformation

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: More speedups for tuple deformation
Дата
Msg-id CAApHDvrF6DG7=xD8JGo2HoQKN0LRFNF0ysVt6cKSNPiqbdQOSA@mail.gmail.com
обсуждение исходный текст
Ответ на More speedups for tuple deformation  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On Sun, 28 Dec 2025 at 22:04, David Rowley <dgrowleyml@gmail.com> wrote:
> Things still to do:
>
> * More benchmarking is needed. I've not yet completed the benchmarks
> on my Zen4 machine.  No Intel hardware has been tested at all. I don't
> really have any good Intel hardware to test with. Maybe someone else
> would like to help? Script is attached.

Please find attached an updated set of patches. A rebase was needed,
plus 0003 had a problem with an Assert not handling the bitmap being a
NULL pointer.

I've done some more performance tests after upgrading my Zen2 machine
to use newer versions of gcc and clang. I've also tested on an Intel
machine now. All the results are attached in a spreadsheet form in the
bzip file. There's also a pg_dump of the results and
analysis_schema.sql, which has an SQL function to extract the data in
a form that's compatible with the spreadsheet's format.

I'd say things are looking generally good for 0001 without the
OPTIMIZE_BYVAL stuff, but the results I got from clang on the
AMD7945hx don't look good at all. I'll run the tests on that again
tonight. The machine is a laptop and I did run the benchmarks on
master first to establish the baseline. I want to ensure there's no
thermal throttling going on. Aside from clang on the 7945hx, there are
a few cases where there's a slight regression in the 0 extra column
tests when a NULL is present. I wonder how much we should care about
this as 1) the regression is small; and, 2) IMO, there's less chance
of there being a NULL in a table with very few columns, in this case,
the table has 3 columns.

The "AMD3990x clang 20.1.8" results in the spreadsheet also look
strange for 0001. It looks good up to 20 columns, then the performance
trend breaks for 30 and 40 columns. I don't have an explanation for
this yet.

I've also attached an updated script to run the tests and output the
results in csv format so that it can be easily imported into Postgres
for analysis or processing.

> * I've not looked at the JIT deforming code. At the moment the code
> won't even compile with LLVM enabled because I've removed the
> TTS_FLAG_SLOW flag. It's possible I'll have to adjust the JIT
> deforming code or consider keeping TTS_FLAG_SLOW.

This part turned out to be easy. The JIT deformer does not pay
attention to the TTS_FLAG_SLOW flag, it just unconditionally turns it
on to force the non-jit deformer into using slow mode. I've deleted
the code that was setting it since slow mode no longer exists in the
patched code.

David

Вложения

В списке pgsql-hackers по дате отправления: