Re: Fuzzy string matching of product names
От | Peter Geoghegan |
---|---|
Тема | Re: Fuzzy string matching of product names |
Дата | |
Msg-id | y2kdb471ace1004060302qcaa5788fjae649dd93206858f@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Fuzzy string matching of product names (Leif Biberg Kristensen <leif@solumslekt.org>) |
Список | pgsql-general |
> I've got a similar problem in my 18th century research, when clerks usually > took pride in being able to spell a name in any number of ways. I've landed on > a solution where I'm sending search strings to SIMILAR TO. I usually get far > too many hits, but it's much easier to browse through 100 hits than the entire > dataset which is approaching 60,000 records. > > Optimizing the search strings is based upon a lot of experience. That sounds like an interesting problem - mine sounds mundane in comparison. I now seem to be getting reasonable results with pg_trgm coupled with ILIKE. I ORDER BY description ILIKE '%%usr_str%%' DESC, similarity(description, 'usr_str') DESC , prioritising products that actually contain the user specified string. I have tweaked the "is greater than similarity" of my queries, to include a bit of rubbish to be on the safe side, but not too much. I've done this with what I believe to be a representative dataset. You can create a GiST or GIN index on text fields for these queries, which is nice. Have you tried using pg_trgm with your dataset? You might have more success. It isn't biased towards a particular natural language. Also, I suggest you avoid SQL regular expressions (which are supported with SIMILAR TO), and use the ~ operator instead, which gives you the more powerful POSIX regular expressions, unless portability is a major concern. Regards, Peter Geoghegan
В списке pgsql-general по дате отправления: