Re: Weird problem concerning tsearch functions built into postgres 8.3, assistance requested
От | Teodor Sigaev |
---|---|
Тема | Re: Weird problem concerning tsearch functions built into postgres 8.3, assistance requested |
Дата | |
Msg-id | 4909B8A4.4050706@sigaev.ru обсуждение исходный текст |
Ответ на | Weird problem concerning tsearch functions built into postgres 8.3, assistance requested (Andrew Edson <cheighlund@yahoo.com>) |
Список | pgsql-general |
> One of the tables we're using in the 8.1.3 setups currently running > includes phone numbers as a searchable field (fti_phone), with the > results of a select on the field generally looking like this: 'MMM':2 > 'NNNN':3 'MMM-NNNN':1. MMM is the first three digits, NNNN is the > fourth-seventh. > > The weird part is this: On the old systems running 8.1.3, I can look up > a record by > fti_phone using any of the three above items; first three, last four, or > entire number including dash. On the new system running 8.3.1, I can do > a lookup by the first three or the last four and get the results I'm > after, but any attempt to do a lookup by the entire MMM-NNNN version > returns no records. Parser was changed: postgres=# select * from ts_debug('123-4567'); alias | description | token | dictionaries | dictionary | lexemes -------+------------------+-------+--------------+------------+--------- uint | Unsigned integer | 123 | {simple} | simple | {123} int | Signed integer | -4567 | {simple} | simple | {-4567} (2 rows) postgres=# select * from ts_debug('abc-defj'); alias | description | token | dictionaries | dictionary | lexemes -----------------+---------------------------------+----------+----------------+--------------+------------ asciihword | Hyphenated word, all ASCII | abc-defj | {english_stem} | english_stem | {abc-defj} hword_asciipart | Hyphenated word part, all ASCII | abc | {english_stem} | english_stem | {abc} blank | Space symbols | - | {} | | hword_asciipart | Hyphenated word part, all ASCII | defj | {english_stem} | english_stem | {defj} Parser in 8.1 threats any [alnum]+-[alnum]+ as a hyphenated word, but 8.3 treats [digit]+-[digit]+ as two separated numbers. So, you can play around pre-process texts before indexing or have a look on regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html) -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
В списке pgsql-general по дате отправления: