Re: Yet another fast GiST build

Поиск

Список

Период

Сортировка

От	Darafei "Komяpa" Praliaskouski
Тема	Re: Yet another fast GiST build
Дата	9 сентября 2020 г. 07:05:04
Msg-id	CAC8Q8t+0sHUWZTpk23QnW5UFAn1kuNq+YhM6encWxZXWz32e4Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Yet another fast GiST build ("Andrey M. Borodin" <x4mmm@yandex-team.ru>)
Ответы	Re: Yet another fast GiST build
Список	pgsql-hackers

Дерево обсуждения

Hi,

On Wed, Sep 9, 2020 at 9:43 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

> 9 сент. 2020 г., в 00:05, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> I've been reviewing the patch today. The biggest changes I've made have been in restructuring the code in gistbuild.c for readability, but there are a bunch of smaller changes throughout. Attached is what I've got so far, squashed into one patch.
Thanks!

> I'm continuing to review it, but a couple of questions so far:
>
> In the gistBuildCallback(), you're skipping the tuple if 'tupleIsAlive == false'. That seems fishy, surely we need to index recently-dead tuples, too. The normal index build path isn't skipping them either.
That's an oversight.
>
> How does the 'sortsupport' routine interact with 'compress'/'decompress'? Which representation is passed to the comparator routine: the original value from the table, the compressed representation, or the decompressed representation? Do the comparetup_index_btree() and readtup_index() routines agree with that?

Currently we pass compressed values, which seems not very good.
But there was a request from PostGIS maintainers to pass values before decompression.
Darafei, please, correct me if I'm wrong. Also can you please provide link on PostGIS B-tree sorting functions?

We were expecting to reuse btree opclass for this thing. This way btree_gist extension will become a lot thinner. :)

Core routine for current sorting implementation is Hilbert curve, which is based on 2D center of a box - and used for abbreviated sort:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gbox.c#L893

All the btree functions are wrappers around gserialized_cmp which just adds a bunch of tiebreakers that don't matter in practice:
https://github.com/postgis/postgis/blob/2a7ebd0111b02aed3aa24752aad0ba89aef5d431/liblwgeom/gserialized.c#L313

Base representation for index compressed datatype is GIDX, which is also a box. We can make it work on top of it instead of the original representation.
There is no such thing as "decompressed representation" unfortunately as compression is lossy.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Yet another fast GiST build