Re: Yet another fast GiST build
От | Heikki Linnakangas |
---|---|
Тема | Re: Yet another fast GiST build |
Дата | |
Msg-id | 7386285b-0e2f-e89e-81f4-f63775becb2e@iki.fi обсуждение исходный текст |
Ответ на | Re: Yet another fast GiST build (Andrey Borodin <x4mmm@yandex-team.ru>) |
Ответы |
Re: Yet another fast GiST build
|
Список | pgsql-hackers |
On 07/04/2021 15:12, Andrey Borodin wrote: >> 7 апр. 2021 г., в 14:56, Heikki Linnakangas <hlinnaka@iki.fi> >> написал(а): >> >> Ok, I think I understand that now. In btree_gist, the *_cmp() >> function operates on non-leaf values, and *_lt(), *_gt() et al >> operate on leaf values. For all other datatypes, the leaf and >> non-leaf representation is the same, but for bit/varbit, the >> non-leaf representation is different. The leaf representation is >> VarBit, and non-leaf is just the bits without the 'bit_len' field. >> That's why it is indeed correct for gbt_bitcmp() to just use >> byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len' field >> separately. That's subtle, and 100% uncommented. >> >> What that means for this patch is that gbt_bit_sort_build_cmp() >> should *not* call byteacmp(), but bitcmp(). Because it operates on >> the original datatype stored in the table. > > +1 Thanks for investigating this. If I understand things right, > adding test values with different lengths of bit sequences would not > uncover the problem anyway? That's right, the only consequence of a "wrong" sort order is that the quality of the tree suffers, and scans need to scan more pages unnecessarily. I tried to investigate this by creating a varbit index with and without sorting, and compared them with pageinspect, but in quick testing, I wasn't able to find cases where the sorted version was badly ordered. I guess I didn't find the right data set yet. - Heikki
В списке pgsql-hackers по дате отправления: