Re: Fuzzy substring searching with the pg_trgm extension
От | Teodor Sigaev |
---|---|
Тема | Re: Fuzzy substring searching with the pg_trgm extension |
Дата | |
Msg-id | 56BC7EF4.2030903@sigaev.ru обсуждение исходный текст |
Ответ на | Re: Fuzzy substring searching with the pg_trgm extension (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Список | pgsql-hackers |
>>> The behavior of this function is surprising to me. >>> >>> select substring_similarity('dog' , 'hotdogpound') ; >>> >>> substring_similarity >>> ---------------------- >>> 0.25 >>> >> Substring search was desined to search similar word in string: >> contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ; >> substring_similarity >> ---------------------- >> 0.75 >> >> contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ; >> substring_similarity >> ---------------------- >> 1 > > Hmm, this behavior looks too much like magic to me. I mean, a substring > is a substring -- why are we treating the space as a special character > here? Because it isn't a regex for substring search. Since implementing, pg_trgm works over words in string. contrib_regression=# select similarity('block hole', 'hole black'); similarity ------------ 0.571429 contrib_regression=# select similarity('block hole', 'black hole'); similarity ------------ 0.571429 It ignores spaces between words and word's order. I agree, that substring_similarity is confusing name, but actually it search most similar word in second arg to first arg and returns their similarity. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: