Re: WIP: store additional info in GIN index
От | Tomas Vondra |
---|---|
Тема | Re: WIP: store additional info in GIN index |
Дата | |
Msg-id | 50BFF89A.7080908@fuzzy.cz обсуждение исходный текст |
Ответ на | Re: WIP: store additional info in GIN index (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: WIP: store additional info in GIN index
|
Список | pgsql-hackers |
On 5.12.2012 09:10, Alexander Korotkov wrote: > On Wed, Dec 5, 2012 at 1:56 AM, Tomas Vondra <tv@fuzzy.cz > <mailto:tv@fuzzy.cz>> wrote: > > Thanks for bug report. It is fixed in the attached patch. Hi, I gave it another try and this time it went fine - I didn't get any segfault when loading the data, which is a good news. Then I've run a simple benchmarking script, and the results are not as good as I expected, actually I'm getting much worse performance than with the original GIN index. The following table contains the time of loading the data (not a big difference), and number of queries per minute for various number of words in the query. The queries looks like this SELECT id FROM messagesWHERE body_tsvector @@ plainto_tsquery('english', 'word1 word2 ...') so it's really the simplest form of FTS query possible. without patch | with patch -------------------------------------------- loading 750 sec | 770 sec 1 word 1500 | 1100 2 words 23000 | 9800 3 words 24000 | 9700 4 words 16000 | 7200 -------------------------------------------- I'm not saying this is a perfect benchmark, but the differences (of querying) are pretty huge. Not sure where this difference comes from, but it seems to be quite consistent (I usually get +-10% results, which is negligible considering the huge difference). Is this an expected behaviour that will be fixed by another patch? The database contains ~680k messages from the mailing list archives, i.e. about 900 MB of data (in the table), and the GIN index on tsvector is about 900MB too. So the whole dataset nicely fits into memory (8GB RAM), and it seems to be completely CPU bound (no I/O activity at all). The configuration was exactly the same in both cases shared buffers = 1GB work mem = 64 MB maintenance work mem = 256 MB I can either upload the database somewhere, or provide the benchmarking script if needed. Tomas
В списке pgsql-hackers по дате отправления: