Re: like/ilike improvements
От | Andrew Dunstan |
---|---|
Тема | Re: like/ilike improvements |
Дата | |
Msg-id | 4656563F.50608@dunslane.net обсуждение исходный текст |
Ответ на | Re: like/ilike improvements (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> You have to be on a first byte before you can meaningfully apply >>> NextChar, and you have to use NextChar or else you don't count >>> characters correctly (eg "__" must match 2 chars not 2 bytes). >>> > > >> Yes, I agree completely. However it looks to me like IsFirstByte will in >> fact always be true when we get to call NextChar for matching "_" for UTF8. >> > > If that's true, the patch is failing to achieve its goal of treating % > bytewise ... > Let's back up. % processing works by looking for a place in the text that might match what follows % in the pattern, and then calling itself recursively. For UTF8, if what follows % is _, it does that search by repeatedly calling NextChar - otherwise it calls NextByte. But if we're not processing a wildcard we have to match an actual complete UTF8 char, so the fact that we proceed byte-wise won't get us out of sync. whenever we happen to encounter an _. We can't rely on that process for other multi-byte charsets because the suffix of one char might be the prefix of another, so we could get false matches. That can't happen with UTF8. cheers andrew
В списке pgsql-hackers по дате отправления: