Re: Hash Indexes. (Was: planner complaints)
От | Tom Lane |
---|---|
Тема | Re: Hash Indexes. (Was: planner complaints) |
Дата | |
Msg-id | 26085.954797589@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Hash Indexes. (Was: planner complaints) (Mark Dalphin <mdalphin@amgen.com>) |
Список | pgsql-sql |
Mark Dalphin <mdalphin@amgen.com> writes: > Tom Lane wrote: >> Why would you do that? The hash index method doesn't have any advantage >> over btree that I can see, and it's got a lot of disadvantages. > Tom, I have heard this stated several times in this list and yet it > contradicts what I was taught in my course on databases. It was > explained that using a HASH index could be faster than a BTREE index > for direct lookup of an item, however, the tradeoff was that you > couldn't do "unequal" comparisons (ie COLUMN < SomeValue). The speed > gain was because the HASH index could go directly to the page > containing the data while the btree index might need to load several > pages to get to the final data, especially for large BTREE indexes. > Is this simply not true for PostgreSQL, or do you think it isn't true > in general (for most implementations of HASH and BTREE)? Hmm. I haven't actually timed hash versus btree lookups in Postgres; it's possible that hash is faster (at least for some tables). I'm not sure I believe that a hash "can go directly to the right page", though. You could go directly to the right hash bucket, but how many index entries will be in the bucket? You might still have to follow a chain of overflow pages. As Professor Knuth remarked about hashing, when you use it you are absolutely depending on the laws of probability: the average case is really good, but the worst case is awful. Btrees have more predictable performance. When I commented that hashes have disadvantages, I was thinking of other issues that are Postgres-specific: PG's hash indexes support fewer data types than our btrees do, and our btrees support concurrent index updates while hashes don't. If you use a hash index you are pretty much back to the bad old pre-MVCC days: you can have N readers *or* one writer at any instant. (I imagine this could be fixed if anyone cared to invest the work.) The btree code is also a lot more heavily used by most people, so I'd expect it to have fewer bugs. And, as you say, btrees support order-related queries while hashes only support equality. IMHO this alone is sufficient to justify choosing btree, unless you are quite certain that you know all the kinds of query that you will be throwing at the table and none of them involve ordering. regards, tom lane
В списке pgsql-sql по дате отправления: