Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support)
От | Gordon Mohr |
---|---|
Тема | Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support) |
Дата | |
Msg-id | 526C48FF.4000704@xavvy.com обсуждение исходный текст |
Ответ на | Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support) (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Список | pgsql-hackers |
On 10/23/13 9:05 PM, Alvaro Herrera wrote: > Gordon Mohr wrote: > >> Thanks for this! I decided to give the patch a try at the bleeding >> edge with some high-dimensional vectors, specifically the 1.4 >> million 1000-dimensional Freebase entity vectors from the Google >> 'word2vec' project: >> >> https://code.google.com/p/word2vec/#Pre-trained_entity_vectors_with_Freebase_naming >> >> Unfortunately, here's what I found: > > I wonder if these results would improve with this patch: > http://www.postgresql.org/message-id/EFEDC2BF-AB35-4E2C-911F-FC88DA6473D7@gmail.com Thanks for the pointer; I'd missed that relevant update from Stas Kelvich. I applied that patch, and reindexed. On the 100-dimension, 850K vector set: indexing: 1137s (vs. 1344s) DATA size: 4.7G (vs 5.0G) top-11-nearest-neighbor query: 32s (vs ~57s) On the 500-dimension, 100K vector set: indexing: 756s (vs. 977s) DATA size: 4.5G (vs. 4.8G) top-11-nearest-neighbor query: 18s (vs ~46s) So, moderate (5-20%) improvements in indexing time and size, and larger (40-60%) speedups in index-assisted (<->) queries... but those index-assisted queries are still ~10X+ slower than the sequence-scan (distance_euclid()) queries, so the existence of the knn-GIST index is still harming rather than hurting performance. Will update if my understanding changes; still interested to hear if I've missed a key factor/switch needed for these indexes to work well. - Gordon Mohr
В списке pgsql-hackers по дате отправления: