Re: strange behavior of pg_trgm's similarity function
От | Heikki Linnakangas |
---|---|
Тема | Re: strange behavior of pg_trgm's similarity function |
Дата | |
Msg-id | 525699BE.9040508@vmware.com обсуждение исходный текст |
Ответ на | strange behavior of pg_trgm's similarity function (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: strange behavior of pg_trgm's similarity function
|
Список | pgsql-hackers |
On 10.10.2013 15:03, Fujii Masao wrote: > Hi, > > The behavior of pg_trgm's similarity function seems strange. Is this > intentional? > > I was thinking that the following three calls of the similarity function return > the same number because the second argument is just the three characters > contained in the first argument in every calls. > > =# SELECT similarity('12345', '123'); > =# SELECT similarity('12345', '234'); > =# SELECT similarity('12345', '345'); > > But that's not true. Each returns the different number. > > =# SELECT similarity('12345', '123'); > similarity > ------------ > 0.428571 > (1 row) > > =# SELECT similarity('12345', '234'); > similarity > ------------ > 0.111111 > (1 row) > > =# SELECT similarity('12345', '345'); > similarity > ------------ > 0.25 > (1 row) > > This happens because, for example, similarity('12345', '123') returns > the similarity number of '**12345*' and '**123*' (* means the blank character), > NOT '12345' and '123'. IOW, two and one blank characters are added into > the heading and tailing of each argument, respectively. I wonder why > pg_trgm's similarity function works in this way. We should change this > so that no blank characters are added into the arguments? Well, you could also argue that "111111" and "222222" are quite similar, even though pg_trgm's similarity will not think so. It comes down to the definition of similarity, and how well that definition matches your intuition. FWIW, it feels right to me that a match in the beginning of a word is worth more than one in the middle of a string. -1 on changing that. - Heikki
В списке pgsql-hackers по дате отправления: