Re: UTF8MatchText
От | Andrew Dunstan |
---|---|
Тема | Re: UTF8MatchText |
Дата | |
Msg-id | 46505704.8040203@dunslane.net обсуждение исходный текст |
Ответ на | Re: UTF8MatchText (Andrew Dunstan <andrew@dunslane.net>) |
Ответы |
Re: UTF8MatchText
|
Список | pgsql-patches |
I wrote: > > >> >> It is only when you have a pattern like '%_' when this is a problem >> and we could detect this and do byte by byte when it's not. Now we >> check (*p == '\\') || (*p == '_') in each iteration when we scan over >> characters for '%', and we could do it once and have different loops >> for the two cases. >> >> Other than this part that I think can be optimized I don't see >> anything wrong with the idea behind the patch. To make the '%' case >> fast might be an important optimization for a lot of use cases. It's >> not uncommon that '%' matches a bigger part of the string than the >> rest of the pattern. >> > > > Are you sure? The big remaining char-matching bottleneck will surely > be in the code that scans for a place to start matching a %. But > that's exactly where we can't use byte matching for cases where the > charset might include AB and BA as characters - the pattern might > contain %BA and the string AB. However, this isn't a danger for UTF8, > which leads me to think that we do indeed need a special case for > UTF8, but for a different improvement from that proposed in the > original patch. I'll post an updated patch shortly. > Here is a patch that implements this. Please analyse for possible breakage. cheers andrew
В списке pgsql-patches по дате отправления: