Re: GIN improvements part 1: additional information
От | Tomas Vondra |
---|---|
Тема | Re: GIN improvements part 1: additional information |
Дата | |
Msg-id | 51D84912.2080000@fuzzy.cz обсуждение исходный текст |
Ответ на | Re: GIN improvements part 1: additional information (Alexander Korotkov <aekorotkov@gmail.com>) |
Список | pgsql-hackers |
Hi, I've done a fair amount of testing by loading pgsql-general archives into a database and running a bunch of simple ts queries that use a GIN index. I've tested this as well as the two other patches, but as I was able to get meaningful results only from this patch, I'll post the results here and info about segfaults and other observed errors to the other threads. First of all - update the commitfest page whenever you submit a new patch version, please. I've spent two or three hours testing and debugging a patches linked from those pages only to find out that there are newer versions. I should have checked that initially, but let's keep that updated. I wan't able to apply the patches to the current head, so I've used b8fd1a09 (from 17/06) as a base commit. The following table shows these metrics: * data load - how long it took to import ~200k messages from the list archive - includes a lot of time spent in Python(parsing), checking FKs ... - so unless this is significantly higher, it's probably OK * index size - size of the main GIN index on message body * 1/2/3-word(s) - number of queries in the form SELECT id FROM messages WHERE body_tsvector @@ plainto_tsquery('english', 'w1 w2') LIMIT 100 (executed over 60 seconds, and 'per second' speed) All the scripts are available at https://bitbucket.org/tvondra/archie Now, the results: no patches: data load: 710 s index size: 545 MB 1 word: 37500 (630/s) 2 words: 49800 (800/s) 3 words: 40000 (660/s) additional info (ginaddinfo.7.patch): data load: 693 s index size: 448 MB 1 word: 135000 (2250/s) 2 words: 85000 (1430/s) 3 words: 54000 ( 900/s) additional info + fast scan (gin_fast_scan.4.patch): data load: 720 s index size: 455 MB 1 word: FAIL 2 words: FAIL 3 words: FAIL additional info + fast scan + ordering (gin_ordering.4.patch): data load: FAIL index size: N/A 1 word: N/A 2words: N/A 3 words: N/A So the speedup after adding info into GIN seems very promising, although I don't quite understand why searching for two words is so much slower. Also the index size seems to decrease significantly. After applying 'fast scan' the things started to break down, so I wasn't able to run the queries and then even the load failed consistently. I'll post the info into the appropriate threads. Tomas
В списке pgsql-hackers по дате отправления: