Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds
От | Tom Lane |
---|---|
Тема | Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds |
Дата | |
Msg-id | 2266571.1630611849@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds (Michael Lewis <mlewis@entrata.com>) |
Ответы |
Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds
|
Список | pgsql-general |
Michael Lewis <mlewis@entrata.com> writes: > This is showing many false positives from the index scan that get removed > when the actual values are examined. With such a long search parameter, > that does not seem surprising. I would expect a search on "raj nagar > ghaziabad 201017" or something like that to yield far fewer results from > the index scan. I don't know GIN indexes super well, but I would guess that > including words that are very common will yield false positives that get > filtered out later. Yeah, the huge "Rows Removed" number shows that this index is very poorly adapted to the query. I don't think the problem is with GIN per se, but with a poor choice of how to use it. The given example looks like what the OP really wants to do is full text search. If so, a GIN index should be fine as long as you put tsvector/tsquery filtering in front of it. If that's not a good characterization of the goal, it'd help to tell us what the goal is. (Just saying "I want to use jaccard similarity" sounds a lot like a man whose only tool is a hammer, therefore his problem must be a nail, despite evidence to the contrary.) regards, tom lane
В списке pgsql-general по дате отправления: