Re: Yet another fast GiST build
От | Andrey M. Borodin |
---|---|
Тема | Re: Yet another fast GiST build |
Дата | |
Msg-id | 8516B6A8-11B2-41A8-B7EB-D0C38ADD06C3@yandex-team.ru обсуждение исходный текст |
Ответ на | Re: Yet another fast GiST build (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: Yet another fast GiST build
|
Список | pgsql-hackers |
> 9 сент. 2020 г., в 20:39, Heikki Linnakangas <hlinnaka@iki.fi> написал(а): > > On 09/09/2020 15:20, Darafei "Komяpa" Praliaskouski wrote: >> On Wed, Sep 9, 2020 at 3:09 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> Come to think of it, the point z-order comparator could benefit a lot >>> from key abbreviation, too. You could do the point -> zorder conversion >>> in the abbreviation routine. >> That's how it works in PostGIS, only that we moved to more >> effecient Hilbert curve: >> https://github.com/postgis/postgis/blob/54399b9f6b0f02e8db9444f9f042b8d4ca6d4fa4/postgis/lwgeom_btree.c#L171 > > Thanks, that's interesting. > > I implemented the abbreviated keys for the point opclass, too, and noticed that the patch as it was never used it. I reworkedthe patch so that tuplesort_begin_index_gist() is responsible for looking up the sortsupport function, like tuplesort_begin_index_btree()does, and uses abbreviation when possible. Wow, abbreviated sort made gist for points construction even 1.5x faster! btw there is small typo in arg names in gist_bbox_zorder_cmp_abbrev(); z1,z2 -> a,b > do we have regression test coverage for this? Yes, sorting build for points is tested in point.sql, but with small dataset. index_including_gist.sql seems to be workingwith boxes, but triggers point paths too. > , also on a SIZEOF_DATUM==4 system since the abbreviation works differently with that, and push if nothing new comes up.And clarify the documentation and/or comments that the sortsupport function sees "compressed" values. > > I wonder if we could use sorting to also speed up building tsvector indexes? The values stored there are bit signatures,what would be a good sort order for those? We need an order so that nearby values have a lot of bits in common. What is the length of this signature? For each 4 bytes we can compute number of 1s in it's binary representation. Then z-order these dwords as values 0-32. This will be very inefficient grouping, but it will tend to keep empty and dense 4-byte regions apart. Thanks for working on this! Best regards, Andrey Borodin.
В списке pgsql-hackers по дате отправления: