Have nodeSort.c use datum sorts single-value byref types
От | David Rowley |
---|---|
Тема | Have nodeSort.c use datum sorts single-value byref types |
Дата | |
Msg-id | CAApHDvqHonfe9G1cVaKeHbDx70R_zCrM3qP2AGXpGrieSKGnhA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Have nodeSort.c use datum sorts single-value byref types
|
Список | pgsql-hackers |
We originally did this in 91e9e89dc, but a memory leak was discovered as I neglected to pfree the datum which is freshly allocated in tuplesort_getdatum. Because that was discovered late in the PG15 cycle, we opted to just disable the datum sort optimisation for byref types in 3a5817695. As was mentioned in [1], it looks like we could really use a version of tuplesort_getdatum which does not palloc a new Datum. nodeSort.c, when calling tuplesort_gettupleslot passes copy==false, so it would make sense if the datum sort variation didn't do any copying either. In the attached patch, I've added a function named tuplesort_getdatum_nocopy() which is the same as tuplesort_getdatum() only without the datumCopy(). I opted for the new function rather than a new parameter in the existing function just to reduce branching and additional needless overhead. I also looked at the tuplesort_getdatum() call inside process_ordered_aggregate_single() and made a few changes there so we don't needlessly perform a datumCopy() when we skip a Datum due to finding it the same as the previous Datum in a DISTINCT aggregate situation. I was also looking at mode_final(). Perhaps that could do with the same treatment, I just didn't touch it in the attached patch. A quick performance test with: create table t1 (a varchar(32) not null, b varchar(32) not null); insert into t1 select md5((x%10)::text),md5((x%10)::text) from generate_Series(1,1000000)x; vacuum freeze t1; create index on t1(a); Yields a small speedup for the DISTINCT aggregate case. work_mem = 256MB query = select max(distinct a), max(distinct b) from t1; Master: latency average = 313.197 ms Patched: latency average = 304.335 ms (about 3% faster) The Datum sort in nodeSort.c is more impressive. query = select b from t1 order by b offset 1000000; Master: latency average = 344.763 ms Patched: latency average = 268.374 ms (about 28% faster) I'll add this to the November CF David [1] https://www.postgresql.org/message-id/CAApHDvqS6wC5U==k9Hd26E4EQXH3QR67-T4=Q1rQ36NGvjfVSg@mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: