Re: UTF8MatchText
От | Andrew Dunstan |
---|---|
Тема | Re: UTF8MatchText |
Дата | |
Msg-id | 46504CFD.5040505@dunslane.net обсуждение исходный текст |
Ответ на | Re: UTF8MatchText (Dennis Bjorklund <db@zigo.dhs.org>) |
Ответы |
Re: UTF8MatchText
|
Список | pgsql-patches |
Dennis Bjorklund wrote: > Tom Lane skrev: >> You could imagine trying to do >> % a byte at a time (and indeed that's what I'd been thinking it did) >> but that gets you out of sync which breaks the _ case. > > It is only when you have a pattern like '%_' when this is a problem > and we could detect this and do byte by byte when it's not. Now we > check (*p == '\\') || (*p == '_') in each iteration when we scan over > characters for '%', and we could do it once and have different loops > for the two cases. > > Other than this part that I think can be optimized I don't see > anything wrong with the idea behind the patch. To make the '%' case > fast might be an important optimization for a lot of use cases. It's > not uncommon that '%' matches a bigger part of the string than the > rest of the pattern. > Are you sure? The big remaining char-matching bottleneck will surely be in the code that scans for a place to start matching a %. But that's exactly where we can't use byte matching for cases where the charset might include AB and BA as characters - the pattern might contain %BA and the string AB. However, this isn't a danger for UTF8, which leads me to think that we do indeed need a special case for UTF8, but for a different improvement from that proposed in the original patch. I'll post an updated patch shortly. cheers andrew
В списке pgsql-patches по дате отправления: