Re: Yet another fast GiST build
От | Darafei "Komяpa" Praliaskouski |
---|---|
Тема | Re: Yet another fast GiST build |
Дата | |
Msg-id | CAC8Q8t+0sHUWZTpk23QnW5UFAn1kuNq+YhM6encWxZXWz32e4Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Yet another fast GiST build ("Andrey M. Borodin" <x4mmm@yandex-team.ru>) |
Ответы |
Re: Yet another fast GiST build
|
Список | pgsql-hackers |
Hi,
On Wed, Sep 9, 2020 at 9:43 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> 9 сент. 2020 г., в 00:05, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> I've been reviewing the patch today. The biggest changes I've made have been in restructuring the code in gistbuild.c for readability, but there are a bunch of smaller changes throughout. Attached is what I've got so far, squashed into one patch.
Thanks!
> I'm continuing to review it, but a couple of questions so far:
>
> In the gistBuildCallback(), you're skipping the tuple if 'tupleIsAlive == false'. That seems fishy, surely we need to index recently-dead tuples, too. The normal index build path isn't skipping them either.
That's an oversight.
>
> How does the 'sortsupport' routine interact with 'compress'/'decompress'? Which representation is passed to the comparator routine: the original value from the table, the compressed representation, or the decompressed representation? Do the comparetup_index_btree() and readtup_index() routines agree with that?
Currently we pass compressed values, which seems not very good.
But there was a request from PostGIS maintainers to pass values before decompression.
Darafei, please, correct me if I'm wrong. Also can you please provide link on PostGIS B-tree sorting functions?
We were expecting to reuse btree opclass for this thing. This way btree_gist extension will become a lot thinner. :)
Core routine for current sorting implementation is Hilbert curve, which is based on 2D center of a box - and used for abbreviated sort:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893
All the btree functions are wrappers around gserialized_cmp which just adds a bunch of tiebreakers that don't matter in practice:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313
Base representation for index compressed datatype is GIDX, which is also a box. We can make it work on top of it instead of the original representation.
There is no such thing as "decompressed representation" unfortunately as compression is lossy.
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313
Base representation for index compressed datatype is GIDX, which is also a box. We can make it work on top of it instead of the original representation.
There is no such thing as "decompressed representation" unfortunately as compression is lossy.
В списке pgsql-hackers по дате отправления: