Re: Rework of collation code, extensibility

Поиск

Список

Период

Сортировка

От	Jeff Davis
Тема	Re: Rework of collation code, extensibility
Дата	26 января 2023 г. 23:47:13
Msg-id	64039a2dbcba6f42ed2f32bb5f0371870a70afda.camel@j-davis.com обсуждение исходный текст
Ответ на	Re: Rework of collation code, extensibility (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: Rework of collation code, extensibility
Список	pgsql-hackers

Дерево обсуждения

Attached v9 and added perf numbers below.

I'm hoping to commit 0002 and 0003 soon-ish, maybe a week or two,
please let me know if you want me to hold off. (I won't commit the GUCs
unless others find them generally useful; they are included here to
more easily reproduce my performance tests.)

My primary motivation is still related to
https://commitfest.postgresql.org/41/3956/ but the combination of
cleaner code and a performance boost seems like reasonable
justification for this patch set independently.

There aren't any clear open items on this patch. Peter Eisentraut asked
me to focus this thread on the refactoring, which I've done by reducing
it to 2 patches, and I left multilib ICU up to the other thread. He
also questioned the increased line count, but I think the currently-low
line count is due to bad style. PeterG provided some review comments,
in particular when to do the tiebreaking, which I addressed.

This patch has been around for a while, so I'll take a fresh look and
see if I see risk areas, and re-run a few sanity checks. Of course more
feedback would also be welcome.

PERFORMANCE:

======
Setup:
======

base: master with v9-0001 applied (GUCs only)
refactor: master with v9-0001, v9-0002, v9-0003 applied

Note that I wasn't able to see any performance difference between the
base and master, v9-0001 just adds some GUCs to make testing easier.

glibc  2.35      ICU 70.1
gcc    11.3.0    LLVM 14.0.0

built with meson (uses -O3)

$ perl text_generator.pl 10000000 10 > /tmp/strings.utf8.txt

CREATE TABLE s (t TEXT);
COPY s FROM '/tmp/strings.utf8.txt';
VACUUM FREEZE s;
CHECKPOINT;
SET work_mem='10GB';
SET max_parallel_workers = 0;
SET max_parallel_workers_per_gather = 0;

=============
Test queries:
=============

EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "C";
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en_US";
EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu";

Timings are measured as the milliseconds to return the first tuple from
the Sort operator (as reported in EXPLAIN ANALYZE). Median of three
runs.

========
Results:
========

                              base    refactor   speedup

sort_abbreviated_keys=false:
  C                           7377        7273      1.4%
  en_US                      35081       35090      0.0%
  en-US-x-ixu                20520       19465      5.4%

sort_abbreviated_keys=true:
  C                           8105        8008      1.2%
  en_US                      35067       34850      0.6%
  en-US-x-icu                22626       21507      5.2%

===========
Conclusion:
===========

These numbers can move +/-1 percentage point, so I'd interpret anything
less than that as noise. This happens to be the first run where all the
numbers favored the refactoring patch, but it is generally consistent
with what I had seen before.

The important part is that, for ICU, it appears to be a substantial
speedup when using meson (-O3).

Also, when/if the multilib ICU support goes in, that may lose some of
these gains due to an extra indirect function call.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Rework of collation code, extensibility

Вложения