Re: Hash index todo list item
От | Kenneth Marshall |
---|---|
Тема | Re: Hash index todo list item |
Дата | |
Msg-id | 20070905200702.GG14336@it.is.rice.edu обсуждение исходный текст |
Ответ на | Hash index todo list item (Kenneth Marshall <ktm@rice.edu>) |
Ответы |
Re: Hash index todo list item
|
Список | pgsql-hackers |
On Sun, Sep 02, 2007 at 01:04:04PM -0500, Kenneth Marshall wrote: > Dear PostgreSQL Hackers: > > After following the hackers mailing list for quite a while, > I am going to start investigating what will need to be done > to improve hash index performance. Below are the pieces of > this project that I am currently considering: > > 1. Characterize the current hash index implementation against > the BTree index, with a focus on space utilization and > lookup performance against a collection of test data. This > will give a baseline performance test to evaluate the impact > of changes. I initially do not plan to bench the hash creation > process since my initial focus will be on lookup performance. > Here are very basic results for a table with 1.6m entries: postgres=# CREATE TABLE dict (word varchar(100)); CREATE TABLE postgres=# COPY dict FROM '/tmp/words'; COPY 1648379 postgres=# select count(*) from dict; count ---------1648379 (1 row) Time: 11187.418 ms postgres=# select count(*) from dict; count ---------1648379 (1 row) Time: 6040.912 ms postgres=# CREATE INDEX wordhash ON dict USING hash (word); CREATE INDEX Time: 11108707.160 ms postgres=# select * from dict where word = 'avatar'; word --------avatar (1 row) Time: 79.823 ms postgres=# select * from dict where word = 'zebra';word -------zebra (1 row) Time: 9.864 ms postgres=# select * from dict where word = 'turkey'; word --------turkey (1 row) Time: 18.418 ms Time: 1.045 ms Time: 1.257 ms Time: 1.080 ms postgres=# CREATE INDEX wordbtree ON dict USING btree (word); CREATE INDEX Time: 25438.884 ms postgres=# select * from dict where word = 'avatar'; word --------avatar (1 row) Time: 13.400 ms postgres=# select * from dict where word = 'zebra';word -------zebra (1 row) Time: 1.173 ms postgres=# select * from dict where word = 'turkey'; word --------turkey (1 row) Time: 1.186 ms Time: 1.103 ms Time: 1.099 ms Time: 1.108 ms ------------------------------ Size of table = 87556096 Size of hash index = 268451840 Size of btree index = 53510144 From my very small sample on an unloaded machine, a hash index lookup took the least amount of time. It had a much larger initial time which could be attributable to cache population effects. The size is 5X that of the Btree index. I will continue to improve the test suite as more granularity is needed. If anyone has a good data generator, please let me know. Otherwise I will just roll my own. Regards, Ken
В списке pgsql-hackers по дате отправления: