Re: Initial ugly reverse-translator
От | Oleg Bartunov |
---|---|
Тема | Re: Initial ugly reverse-translator |
Дата | |
Msg-id | Pine.LNX.4.64.0804192110060.21547@sn.sai.msu.ru обсуждение исходный текст |
Ответ на | Re: Initial ugly reverse-translator (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Initial ugly reverse-translator
|
Список | pgsql-general |
On Sat, 19 Apr 2008, Tom Lane wrote: > Craig Ringer <craig@postnewspapers.com.au> writes: >> Tom Lane wrote: >>> I don't really see the problem. I assume from your reference to pg_trgm >>> that you're using trigram similarity as the prefilter for potential >>> matches > >> It turns out that's no good anyway, as it appears to ignore characters >> outside the ASCII range. Rather less than useful for searching a >> database of translated strings ;-) > > A quick look at the pg_trgm code suggests that it is only prepared to > deal with single-byte encodings; if you're working in UTF8, which I > suppose you'd have to be, it's dead in the water :-(. Perhaps fixing > that should be on the TODO list. as well as ltree. they are in our todo list: http://www.sai.msu.su/~megera/wiki/TODO > > But in any case maybe the full-text-search stuff would be more useful > as a prefilter? Although honestly, for the speed we need here, I'm > not sure a prefilter is needed at all. Full text might be useful > if a LIKE-based match fails, though. > >>> (And besides, speed doesn't seem like the be-all and end-all here.) > >> True. It's not so much the speed as the fragility when faced with small >> changes to formatting. In addition to whitespace, some clients mangle >> punctuation with features like automatic "curly"-quoting. > > Yeah. I was wondering whether encoding differences wouldn't be a huge > problem in practice, as well. > > regards, tom lane > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-general по дате отправления: