pg_trgm vs. Solr ngram
От | Chris |
---|---|
Тема | pg_trgm vs. Solr ngram |
Дата | |
Msg-id | 4628c3f6-e2c5-1484-71cf-62446cec984d@networkz.ch обсуждение исходный текст |
Ответы |
Re: pg_trgm vs. Solr ngram
Re: pg_trgm vs. Solr ngram Re: pg_trgm vs. Solr ngram |
Список | pgsql-general |
Hello list I'm pondering migrating an FTS application from Solr to Postgres, just because we use Postgres for everything else. The application is basically fgrep with a web frontend. However the indexed documents are very computer network specific and contain a lot of hyphenated hostnames with dot-separated domains, as well as IPv4 and IPv6 addresses. In Solr I was using ngrams and customized the TokenizerFactories until more or less only whitespace was as separator, while [.:-_\d] remains part of the ngrams. This allows to search for ".12.255/32" or "xzy-eth5.example.org" without any false positives. It looks like a straight conversion of this method is not possible since the tokenization in pg_trgm is not configurable afaict. Is there some other good method to search for a random substring including all the punctuation using an index? Or a pg_trgm-style module that is more flexible like the Solr/Lucene variant? Or maybe hacking my own pg_trgm wouldn't be so hard and could be fun, do I pretty much just need to change the emitted tokens or will this lead to significant complications in the operators, indexes etc.? thanks for any hints & cheers Christian
В списке pgsql-general по дате отправления: