Re: Full text search on partial URLs
От | Zev Benjamin |
---|---|
Тема | Re: Full text search on partial URLs |
Дата | |
Msg-id | 527A937C.9000703@strangersgate.com обсуждение исходный текст |
Ответ на | Re: Full text search on partial URLs (bricklen <bricklen@gmail.com>) |
Список | pgsql-general |
On 11/06/2013 02:04 PM, bricklen wrote: > > On Wed, Nov 6, 2013 at 10:53 AM, Zev Benjamin > <zev-pgsql@strangersgate.com <mailto:zev-pgsql@strangersgate.com>> wrote: > > On 11/06/2013 01:47 PM, bricklen wrote: > > > On Wed, Nov 6, 2013 at 10:45 AM, Zev Benjamin > <zev-pgsql@strangersgate.com > <mailto:zev-pgsql@strangersgate.com> > <mailto:zev-pgsql@__strangersgate.com > <mailto:zev-pgsql@strangersgate.com>>> wrote: > > Hi, > > I have Postgres full text search set up for my application > and it's > been working great! However, my users would like their > searches to > turn up parts of URLs. For example, they would like a > search for > "foobar" to turn up a document that contains the string > "http://example.com/foobar/____blah > <http://example.com/foobar/__blah> > <http://example.com/foobar/__blah <http://example.com/foobar/blah>>" > (and similarly for queries like "example" and "blah). With the > default dictionaries for host, url, and url_path, the > search query > would have to contain the complete host or url path. > > What is the best way to accomplish this? Should I be > looking at > building a custom dictionary that breaks down hosts and > urls or is > there something simpler I can do? > > > Have you looked into trigrams? > http://www.postgresql.org/__docs/current/static/pgtrgm.__html > <http://www.postgresql.org/docs/current/static/pgtrgm.html> > > > I've looked at it in the context of adding fuzzy search. But my > understanding is that doing a fuzzy search here would only work if > the query were a significant fraction of, say, the url path. For > example, I would expect a fuzzy search of "foobar" on "/foobar/x" to > return a high similarity, but a fuzzy search of "foobar" on > "/foobar/some/very/long/path/__x" to have a low similarity. > > Or are you suggesting using trigrams in a different way? > > > Yeah, I was thinking more along the lines of allowing wildcard > searching, not similarity. > > Eg. > CREATE INDEX yourtable_yourcol_gist_fbi ON yourtable using GIST ( > yourcol gist_trgm_ops ); > select * from yourtable where yourcol ~~ '%foobar%'; > Hrm. That might work. So the application-level search functionality would be the union of tsearch and trigram wildcard matching. If anyone else has other ideas, I'd be interested in hearing them as well. Thanks, Zev
В списке pgsql-general по дате отправления: