Re: Rework of collation code, extensibility
От | Jeff Davis |
---|---|
Тема | Re: Rework of collation code, extensibility |
Дата | |
Msg-id | 64039a2dbcba6f42ed2f32bb5f0371870a70afda.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Rework of collation code, extensibility (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Rework of collation code, extensibility
|
Список | pgsql-hackers |
Attached v9 and added perf numbers below. I'm hoping to commit 0002 and 0003 soon-ish, maybe a week or two, please let me know if you want me to hold off. (I won't commit the GUCs unless others find them generally useful; they are included here to more easily reproduce my performance tests.) My primary motivation is still related to https://commitfest.postgresql.org/41/3956/ but the combination of cleaner code and a performance boost seems like reasonable justification for this patch set independently. There aren't any clear open items on this patch. Peter Eisentraut asked me to focus this thread on the refactoring, which I've done by reducing it to 2 patches, and I left multilib ICU up to the other thread. He also questioned the increased line count, but I think the currently-low line count is due to bad style. PeterG provided some review comments, in particular when to do the tiebreaking, which I addressed. This patch has been around for a while, so I'll take a fresh look and see if I see risk areas, and re-run a few sanity checks. Of course more feedback would also be welcome. PERFORMANCE: ====== Setup: ====== base: master with v9-0001 applied (GUCs only) refactor: master with v9-0001, v9-0002, v9-0003 applied Note that I wasn't able to see any performance difference between the base and master, v9-0001 just adds some GUCs to make testing easier. glibc 2.35 ICU 70.1 gcc 11.3.0 LLVM 14.0.0 built with meson (uses -O3) $ perl text_generator.pl 10000000 10 > /tmp/strings.utf8.txt CREATE TABLE s (t TEXT); COPY s FROM '/tmp/strings.utf8.txt'; VACUUM FREEZE s; CHECKPOINT; SET work_mem='10GB'; SET max_parallel_workers = 0; SET max_parallel_workers_per_gather = 0; ============= Test queries: ============= EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "C"; EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en_US"; EXPLAIN ANALYZE SELECT t FROM s ORDER BY t COLLATE "en-US-x-icu"; Timings are measured as the milliseconds to return the first tuple from the Sort operator (as reported in EXPLAIN ANALYZE). Median of three runs. ======== Results: ======== base refactor speedup sort_abbreviated_keys=false: C 7377 7273 1.4% en_US 35081 35090 0.0% en-US-x-ixu 20520 19465 5.4% sort_abbreviated_keys=true: C 8105 8008 1.2% en_US 35067 34850 0.6% en-US-x-icu 22626 21507 5.2% =========== Conclusion: =========== These numbers can move +/-1 percentage point, so I'd interpret anything less than that as noise. This happens to be the first run where all the numbers favored the refactoring patch, but it is generally consistent with what I had seen before. The important part is that, for ICU, it appears to be a substantial speedup when using meson (-O3). Also, when/if the multilib ICU support goes in, that may lose some of these gains due to an extra indirect function call. -- Jeff Davis PostgreSQL Contributor Team - AWS
Вложения
В списке pgsql-hackers по дате отправления: