Yet another LIKE-indexing scheme
От | Tom Lane |
---|---|
Тема | Yet another LIKE-indexing scheme |
Дата | |
Msg-id | 1429.967916387@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Yet another LIKE-indexing scheme
Re: Yet another LIKE-indexing scheme |
Список | pgsql-hackers |
I had another thought about fixing our problems with deriving index bounds for LIKE patterns in non-ASCII locales. (If you don't remember the gory details here, please re-read threadSigh, LIKE indexing is *still* broken in foreign locales from pgsql-hackers archives of 7 to 10 June, 2000; there are also many previous go-rounds about this long-standing issue.) The problems that I've been told about seem to center around one- and two-character patterns that have special sort rules in some locales. Could we work around these problems by dropping one or perhaps two characters from the end of the given LIKE prefix? For example, givenWHERE name LIKE 'foobar%' drop the last fixed character ('r') and generate index bounds from what remains, using the same algorithm as in 7.0. So the index bounds would becomeWHERE name >= 'fooba' AND name < 'foobb' (at least in ASCII locale --- to make the upper bound, we'd search for a string considered greater than 'fooba' by the local strcmp()). The truncation would need to be multibyte-aware, of course. This would, for example, fix the example given by Erich Stamberger: > Another interresting feature of Czech collation is: > > H < "CH" < I > > and: > > B < C < C + CARON < D .. < H < "CH" < I > > So what happens with "WHERE name like 'Czec%`" ? Our existing code fails because it generates WHERE name >= 'Czec' AND name < 'Czed'; it will therefore not find names beginning 'Czech' because those are in another part of the index, between 'Czeh' and 'Czei'. But WHERE name >= 'Cze' AND name < 'Czf' would work. Are there examples where this still doesn't work? (Funny sort rules for trigraphs would break it, I'm sure, unless we drop two characters instead of just one.) Obviously we could still keep the last character in ASCII locale. That would be a good thing since it'd reduce the number of tuples scanned. Is there a portable way to determine whether it's safe to do so in other locales? (Some inquiry function about whether the sort ordering has any digraph or two-to-one rules might help, but I don't know if there is one.) regards, tom lane
В списке pgsql-hackers по дате отправления: