Re: hash index work
От | Tom Lane |
---|---|
Тема | Re: hash index work |
Дата | |
Msg-id | 9723.1117296218@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: hash index work (Neil Conway <neilc@samurai.com>) |
Ответы |
Re: hash index work
|
Список | pgsql-patches |
Neil Conway <neilc@samurai.com> writes: > The patch does two things: (1) change hash indexes to only store the > key's hash value, not the entire key (2) store index elements within a > hash bucket in order of hash key and search for matches via binary > search. #1 is definitely a win in some in some circumstances (e.g. > indexing large fields or types with expensive equality operators), but > those aren't the common case. I'm surprised that #2 is not a more > noticeable improvement... It occurs to me that change #1 makes it cheaper to skip over index entries that are in the same bucket but don't have the exact same hash code; but it makes it significantly more expensive to skip over entries that have the same hash code but aren't really equal to the key being sought (since you have to visit the heap to find out they aren't equal). Maybe your test workload had enough occurrences of the latter case to balance out the win from the former. I think it would be worth investigating a variant in which the index stores both the hash code and the key value. This allows cheap elimination of non-matching hash codes, and binary sorting of index entries, without adding any extra trips to the heap. The downside is that it makes the index larger so you incur more I/O there --- so this might not be a win either. > One possibility would be to provide an optional implementation of #1, > perhaps via an alternate index operator class. That way people could > select it in those situations in which it is worth using. I think it would definitely be a good idea to make the lossy behavior optional. regards, tom lane
В списке pgsql-patches по дате отправления: