Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem |
Дата | |
Msg-id | 199906100044.UAA14419@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Yes. This pretty much sums up the problem. > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > This certainly explains it. With locale enabled, LIKE does not use > > indexes because we can't figure out how to do the indexing trick with > > non-ASCII character sets because we can't figure out the maximum > > character value for a particular encoding. > > We don't actually need the *maximum* character value, what we need is > to be able to generate a *slightly larger* character value. > > For example, what the parser is doing now: > fld LIKE 'abc%' ==> fld <= 'abc\377' > is not even really right in ASCII locale, because it will reject a > data value like 'abc\377x'. > > I think what we really want is to generate the "next value of the > same length" and use a < comparison. In ASCII locale this means > fld LIKE 'abc%' ==> fld < 'abd' > which is reliable regardless of what comes after abc in the data. > > The trick is to figure out a "next" value without assuming a lot > about the local character set and collation sequence. I had > been thinking about a brute force method: generate a string and > check to see whether strcmp claims it's greater than the original > or not; if not, increment the last byte and try again. You'd > also have to be able to back up and increment earlier bytes if > you maxed out the last one. For example, in French locale, > fld LIKE 'ab\376%' > you'd first produce 'ab\377' but discover that it's less than > 'ab\376' (because \377 is y-dieresis which sorts like 'y'). > Your next try must be 'ac\377' which will succeed. > > But I am worried whether this trick will work in multibyte locales --- > incrementing the last byte might generate an invalid character sequence > and produce unpredictable results from strcmp. So we need some help > from someone who knows a lot about collation orders and multibyte > character representations. > > regards, tom lane > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: