Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Дата
Msg-id 29189.928944532@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem  (Bruce Momjian <maillist@candle.pha.pa.us>)
Ответы Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem  (Bruce Momjian <maillist@candle.pha.pa.us>)
Список pgsql-hackers
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> This certainly explains it.  With locale enabled, LIKE does not use
> indexes because we can't figure out how to do the indexing trick with
> non-ASCII character sets because we can't figure out the maximum
> character value for a particular encoding.

We don't actually need the *maximum* character value, what we need is
to be able to generate a *slightly larger* character value.

For example, what the parser is doing now:fld LIKE 'abc%' ==> fld <= 'abc\377'
is not even really right in ASCII locale, because it will reject a
data value like 'abc\377x'.

I think what we really want is to generate the "next value of the
same length" and use a < comparison.  In ASCII locale this meansfld LIKE 'abc%' ==> fld < 'abd'
which is reliable regardless of what comes after abc in the data.

The trick is to figure out a "next" value without assuming a lot
about the local character set and collation sequence.  I had
been thinking about a brute force method: generate a string and
check to see whether strcmp claims it's greater than the original
or not; if not, increment the last byte and try again.  You'd
also have to be able to back up and increment earlier bytes if
you maxed out the last one.  For example, in French locale,fld LIKE 'ab\376%'
you'd first produce 'ab\377' but discover that it's less than
'ab\376' (because \377 is y-dieresis which sorts like 'y').
Your next try must be 'ac\377' which will succeed.

But I am worried whether this trick will work in multibyte locales ---
incrementing the last byte might generate an invalid character sequence
and produce unpredictable results from strcmp.  So we need some help
from someone who knows a lot about collation orders and multibyte
character representations.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Следующее
От: The Hermit Hacker
Дата:
Сообщение: Re: [HACKERS] Re: [PORTS] Anyone working on linux Alpha?