Re: Latin vs non-Latin words in text search parsing
От | Gregory Stark |
---|---|
Тема | Re: Latin vs non-Latin words in text search parsing |
Дата | |
Msg-id | 87k5pdq2o8.fsf@oxford.xeocode.com обсуждение исходный текст |
Ответ на | Re: Latin vs non-Latin words in text search parsing (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Latin vs non-Latin words in text search parsing
|
Список | pgsql-hackers |
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > I wrote: >> Maybe "aword", "word", and "numword"? > > Does the lack of response mean people are satisfied with that? Sorry, I had a couple responses partially written but never finished. If we were doing it from scratch I would suggest using longer names. At the least I would still suggest using "ascii" or "asciiword" instead of "aword". > Fleshing the proposal out to include the hyphenated-word categories: > > aword All ASCII letters > word All letters according to iswalpha() > numword Mixed letters and digits (all iswalnum()) This does bring up another idea. Using the ctype names. They could be named asciiword, alphaword, alnumword. Frankly I don't think this is any nicer than numword anyways. > I'm not totally thrilled with these short names for the hyphenation > categories, but they will seem at least somewhat familiar to users > of contrib/tsearch2, and it's probably not worth changing them just > to make them look prettier. I tried thinking of better words for this and couldn't think of any. The only other word for a hyphenated word I could think of is probably "compound" and the word for parts of a compound word is "lexeme", but that's certainly not going to be clearer (and technically it's not quite right anyway). So in short I would still suggest using "ascii" instead of just "a" but otherwise I think your suggestion is best. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: