Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Дата
Msg-id 199906100044.UAA14419@candle.pha.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Yes. This pretty much sums up the problem.


> Bruce Momjian <maillist@candle.pha.pa.us> writes:
> > This certainly explains it.  With locale enabled, LIKE does not use
> > indexes because we can't figure out how to do the indexing trick with
> > non-ASCII character sets because we can't figure out the maximum
> > character value for a particular encoding.
> 
> We don't actually need the *maximum* character value, what we need is
> to be able to generate a *slightly larger* character value.
> 
> For example, what the parser is doing now:
>     fld LIKE 'abc%' ==> fld <= 'abc\377'
> is not even really right in ASCII locale, because it will reject a
> data value like 'abc\377x'.
> 
> I think what we really want is to generate the "next value of the
> same length" and use a < comparison.  In ASCII locale this means
>     fld LIKE 'abc%' ==> fld < 'abd'
> which is reliable regardless of what comes after abc in the data.
> 
> The trick is to figure out a "next" value without assuming a lot
> about the local character set and collation sequence.  I had
> been thinking about a brute force method: generate a string and
> check to see whether strcmp claims it's greater than the original
> or not; if not, increment the last byte and try again.  You'd
> also have to be able to back up and increment earlier bytes if
> you maxed out the last one.  For example, in French locale,
>     fld LIKE 'ab\376%'
> you'd first produce 'ab\377' but discover that it's less than
> 'ab\376' (because \377 is y-dieresis which sorts like 'y').
> Your next try must be 'ac\377' which will succeed.
> 
> But I am worried whether this trick will work in multibyte locales ---
> incrementing the last byte might generate an invalid character sequence
> and produce unpredictable results from strcmp.  So we need some help
> from someone who knows a lot about collation orders and multibyte
> character representations.
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem