Re: GIN index build speed
От | Teodor Sigaev |
---|---|
Тема | Re: GIN index build speed |
Дата | |
Msg-id | 49352C18.30702@sigaev.ru обсуждение исходный текст |
Ответ на | GIN index build speed (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Список | pgsql-hackers |
> The issue is that the GIN index build code accumulates the lexemes into > a binary tree, but there's nothing to keep the tree balanced. My test > case with almost monotonically increasing keys, happens to be a > worst-case scenario, and the tree degenerates into almost linked list > that every insertion has to grovel through. Agree, but in most cases it works well. Because lexemes in documents aren't ordered. > > The obvious fix is to use a balanced tree algorithm. I wrote a quick > patch to turn the tree into a splay tree. That fixed the degenerative > behavior, and the runtime of CREATE INDEX for the above test case fell > from 40s to 1.5s. BTW, your patch helps to GIN's btree emulation. With typical scenarios of usage of btree emulation scalar column will be more or less ordered. > > Magnus kindly gave me a dump of the full-text-search tables from > search.postgresql.org, for some real-world testing. Quick testing with > that suggests that the patch unfortunately makes the index build 5-10% > slower with that data set. Do you see ways to improve that? > > We're in commitfest, not supposed to be submitting new features, so I'm > not going to pursue this further right now. Patch attached, however, > which seems to work fine. Personally, I don't object to improve that. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: