Re: UTF8MatchText
От | Andrew Dunstan |
---|---|
Тема | Re: UTF8MatchText |
Дата | |
Msg-id | 464CB3A5.9020600@dunslane.net обсуждение исходный текст |
Ответ на | Re: UTF8MatchText (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: UTF8MatchText
|
Список | pgsql-patches |
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> Except that the entire point of this patch is to dumb down NextChar to >>> be the same as NextByte for UTF8 strings. >>> > > >> That's not what I see in (what I think is) the latest submission, which >> includes this snippet: >> > > [ scratches head... ] OK, then I think I totally missed what this patch > is trying to accomplish; because this code looks just the same as the > existing multibyte-character operations. Where does the performance > improvement come from? > > > That's what bothered me. The trouble is that we have so much code that looks *almost* identical. From my WIP patch, here's where the difference appears to be - note that UTF8 branch has two NextByte calls at the bottom, unlike the other branch: #ifdef UTF8_OPT /* * UTF8 is optimised to do byte at a time matching in most cases, * thus saving expensive calls to NextChar. * * UTF8 has disjoint representations for first-bytes and * not-first-bytes of MB characters, and thus it is * impossible to make a false match in which an MB pattern * character is matched to the end of one data character * plus the start of another. * In character sets without that property, we have to use the * slow way to ensure we don't make out-of-sync matches. */ else if (*p == '_') { NextChar(t, tlen); NextByte(p, plen); continue; } else if (!BYTEEQ(t, p)) { /* * Not the single-character wildcard and no explicit match? Then * time to quit... */ return LIKE_FALSE; } NextByte(t, tlen); NextByte(p, plen); #else /* * Branch for non-utf8 multi-byte charsets and also for single-byte * charsets which don't gain any benefit from the above optimisation. */ else if ((*p != '_') && !CHAREQ(t, p)) { /* * Not the single-character wildcard and no explicit match? Then * time to quit... */ return LIKE_FALSE; } NextChar(t, tlen); NextChar(p, plen); #endif /* UTF8_OPT */ cheers andrew
В списке pgsql-patches по дате отправления: