Re: [HACKERS] Multicolumn hash indexes
От | Tomasz Ostrowski |
---|---|
Тема | Re: [HACKERS] Multicolumn hash indexes |
Дата | |
Msg-id | 3d44a329-7ced-6d43-06c3-4a2d0bb62f38@ato.waw.pl обсуждение исходный текст |
Ответ на | Re: [HACKERS] Multicolumn hash indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] Multicolumn hash indexes
|
Список | pgsql-hackers |
On 09/27/2017 05:57 PM, Tom Lane wrote: > If we follow GIST's lead that the leading column is "most important", > the idea could be to require a search constraint on the first column, > which produces the hash that determines the bucket assignment. Hashes > for additional columns would just be payload data in the index entry. > If you have search constraint(s) on low-order column(s), you can check > for hash matches before visiting the heap, but they don't reduce how > much of the index you have to search. Even btree works that way for > many combinations of incomplete index constraints. I feel that this would eliminate a large amount of potential gains from such an index. This would be usable only when a sufficiently variable column exists, in which case a simple hash index on the column wouldn't be much worse. But I have an idea. What if there was a requirement for the search criteria to use tuple equality comparison:where (a,b,c)=(?,?,?) orwhere (a,b,c) in ((?,?,?),(?,?,?),(?,?,?),(?,?,?)) Wouldn't it eliminate problems with disappearing conditions? Actually maybe this could be implemented just like a functional index. So it would implement reasonably something that otherwise would be a terribly hackish and slow solution like: create or replace function hashhack(a bytea, b bytea)returns bigintlanguage sqlimmutableas $$ -- uses 'x1e' (record separator) -- to ensure hashhack('a','')!=hashhack('','a') select ( 'x' || substr( md5($1||'\x1e'::bytea||$2), 1, 16 ) )::bit(64)::bigint;$$; create index t_hashhack_a_b_idx on t( hashhack(a::bytea,b::bytea) ); select * from t where a='a' and b='b' and hashhack(a::bytea, b::bytea) = hashhack('a'::bytea,'b'::bytea); If if was automatic man could avoid the overhead of converting data to bytea/string, concatenating, truncating, converting back to bigint, rechecking condition etc. that make this kind of hack not very sane. Even providing a specially crafted function or operator for queries specifically targeted for the index would be quite sufficient:where pg_equal( (a,b,c), (?,?,?) ); -- Tomasz "Tometzky" Ostrowski -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления: