Re: UTF8MatchText
От | Andrew Dunstan |
---|---|
Тема | Re: UTF8MatchText |
Дата | |
Msg-id | 46519FDF.5070302@dunslane.net обсуждение исходный текст |
Ответ на | Re: UTF8MatchText (db@zigo.dhs.org) |
Ответы |
Re: UTF8MatchText
|
Список | pgsql-patches |
db@zigo.dhs.org wrote: >> Doh, you're right ... but on third thought, what happens with a pattern >> containing "%_"? If % tries to advance bytewise then we'll be trying to >> apply NextChar in the middle of a data character, and bad things ensue. >> > > Right, when you have '_' after a '%' you need to make sure the '%' > advances full characters. In my suggestion the test if '_' (or '\') come > after the '%' is done once and it select which of the two loops to use, > the one that do byte stepping or the one with NextChar. > > It's difficult to know for sure that we have thought about all the corner > cases. I hope the gain is worth the effort.. :-) > > > Yes, I came to the same conclusion about how to restructure the code. The current code contains this: while (tlen > 0) { /* * Optimization to prevent most recursion: don't recurse * unless first pattern char might match this text char. */ if (CHAREQ(t, p) || (*p == '\\') || (*p == '_')) { int matched = MatchText(t, tlen, p, plen); if (matched != LIKE_FALSE) return matched; /* TRUE or ABORT */ } NextChar(t, tlen); } The code appears to date from v 1.23 of like.c way back in 2001. I'm not sure I agree with the comment, though. In the first place, the invariant tests should not be in the loop, I think, and I'll hoist them out as Dennis suggests. But why are we doing that CHAREQ? If it succeeds we'll just do it again when we recurse, I think. cheers andrew
В списке pgsql-patches по дате отправления: