Re: Latin vs non-Latin words in text search parsing
От | Tom Lane |
---|---|
Тема | Re: Latin vs non-Latin words in text search parsing |
Дата | |
Msg-id | 11092.1193150561@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Latin vs non-Latin words in text search parsing (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Latin vs non-Latin words in text search parsing
Re: Latin vs non-Latin words in text search parsing Re: Latin vs non-Latin words in text search parsing |
Список | pgsql-hackers |
I wrote: > Maybe "aword", "word", and "numword"? Does the lack of response mean people are satisfied with that? Fleshing the proposal out to include the hyphenated-word categories: aword All ASCII letters word All letters according to iswalpha() numword Mixed letters and digits (all iswalnum()) ahword Hyphenated word, all ASCII letters hword Hyphenated word, all letters numhword Hyphenated word, mixed letters and digits apart_hword Part of hyphenated word, all ASCII letters part_hword Part of hyphenated word, all letters numpart_hword Part of hyphenated word, mixed letters and digits (As an example, "foo-beta1" is a numhword, with component tokens "foo" an aword and "beta1" a numword. This is how it works now modulo the redefinition of the base categories.) I'm not totally thrilled with these short names for the hyphenation categories, but they will seem at least somewhat familiar to users of contrib/tsearch2, and it's probably not worth changing them just to make them look prettier. regards, tom lane
В списке pgsql-hackers по дате отправления: