Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
Дата | |
Msg-id | 29189.928944532@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem (Bruce Momjian <maillist@candle.pha.pa.us>) |
Ответы |
Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
|
Список | pgsql-hackers |
Bruce Momjian <maillist@candle.pha.pa.us> writes: > This certainly explains it. With locale enabled, LIKE does not use > indexes because we can't figure out how to do the indexing trick with > non-ASCII character sets because we can't figure out the maximum > character value for a particular encoding. We don't actually need the *maximum* character value, what we need is to be able to generate a *slightly larger* character value. For example, what the parser is doing now:fld LIKE 'abc%' ==> fld <= 'abc\377' is not even really right in ASCII locale, because it will reject a data value like 'abc\377x'. I think what we really want is to generate the "next value of the same length" and use a < comparison. In ASCII locale this meansfld LIKE 'abc%' ==> fld < 'abd' which is reliable regardless of what comes after abc in the data. The trick is to figure out a "next" value without assuming a lot about the local character set and collation sequence. I had been thinking about a brute force method: generate a string and check to see whether strcmp claims it's greater than the original or not; if not, increment the last byte and try again. You'd also have to be able to back up and increment earlier bytes if you maxed out the last one. For example, in French locale,fld LIKE 'ab\376%' you'd first produce 'ab\377' but discover that it's less than 'ab\376' (because \377 is y-dieresis which sorts like 'y'). Your next try must be 'ac\377' which will succeed. But I am worried whether this trick will work in multibyte locales --- incrementing the last byte might generate an invalid character sequence and produce unpredictable results from strcmp. So we need some help from someone who knows a lot about collation orders and multibyte character representations. regards, tom lane
В списке pgsql-hackers по дате отправления: