Re: [HACKERS] indexable and locale
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] indexable and locale |
Дата | |
Msg-id | 199911300152.UAA20942@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] indexable and locale (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Here is Tom's comment on the patch. > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > >> Attached is a patch to the old problem discussed feverly before 6.5. > > > ... I think your pacthes break > > non-ascii multi-byte character sets data and should be surrounded by > > #ifdef LOCALE rather than replacing current codes surrounded by > > #ifndef LOCALE. > > I am worried about this patch too. Under MULTIBYTE could it > generate invalid characters? Also, do all non-ASCII locales sort > codes 0-126 in the same order as ASCII? I didn't think they do, > but I'm not an expert. > > The approach I was considering for fixing the problem was to use a > loop that would repeatedly try to generate a string greater than the > prefix string. The basic loop step would increment the rightmost > byte as Goran has done (or, if it's already up to the limit, chop > it off and increment the next character position). Then test to > see whether the '<' operator actually believes the result is > greater than the given prefix, and repeat if not. This avoids making > any strong assumptions about the sort order of different character > codes. However, there are two significant issues that would have > to be surmounted to make it work reliably: > > 1. In MULTIBYTE mode incrementing the rightmost byte might yield > an illegal multibyte character. Some way to prevent or detect this > would be needed, lest it confuse the comparison operator. I think > we have some multibyte routines that could be used to check for > a valid result, but I haven't looked into it. > > 2. I think there are some locales out there that have context- > sensitive sorting rules, ie, a given character string may sort > differently than you'd expect from considering the characters in > isolation. For example, in German isn't "ss" treated specially? > If "pqrss" does not sort between "pqrs" and "pqrt" then the entire > premise of *both* sides of the LIKE optimization falls apart, > because you can't be sure what will happen when comparing a prefix > string like "pqrs" against longer strings from the database. > I do not know if this is really a problem, nor what we could do > to avoid it if it is. > > regards, tom lane > > ************ > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: