Re: LIKE optimization in UTF-8 and locale-C
От | Andrew - Supernews |
---|---|
Тема | Re: LIKE optimization in UTF-8 and locale-C |
Дата | |
Msg-id | slrnf06r7k.7me.andrew+nonews@atlantis.supernews.net обсуждение исходный текст |
Ответ на | LIKE optimization in UTF-8 and locale-C (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>) |
Список | pgsql-hackers |
On 2007-03-22, Tom Lane <tgl@sss.pgh.pa.us> wrote: > ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: >> I found LIKE operators are slower on multi-byte encoding databases >> than single-byte encoding ones. It comes from difference between >> MatchText() and MBMatchText(). > >> We've had an optimization for single-byte encodings using >> pg_database_encoding_max_length() == 1 test. I'll propose to extend it >> in UTF-8 with locale-C case. > > If this works for UTF8, won't it work for all the backend-legal > encodings? It works for UTF8 only because UTF8 has special properties which are not shared by, for example, EUC_*. Specifically, in UTF8 the octet sequence for a multibyte character will never appear as a subsequence of the octet sequence of a string of other multibyte characters. i.e. given a string of two two-octet characters AB, the second octet of A plus the first octet of B is not a valid UTF8 character (and likewise for longer characters). (And while I haven't tested it, it looks like the patch posted doesn't account properly for the use of _, so it needs a bit more work.) -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
В списке pgsql-hackers по дате отправления: