Re: [v9.2] make_greater_string() does not return a string in some cases
От | Robert Haas |
---|---|
Тема | Re: [v9.2] make_greater_string() does not return a string in some cases |
Дата | |
Msg-id | CA+TgmoavQP28OY0QRkXSQiS131u2sEFc7e72yU3x=pjr0BPU=w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [v9.2] make_greater_string() does not return a string in some cases (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [v9.2] make_greater_string() does not return a string in some cases
|
Список | pgsql-hackers |
On Fri, Sep 23, 2011 at 8:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Anyway, I won't stand in the way of the patch as long as it's modified >>> to limit the number of values considered for any one character position >>> to something reasonably small. > >> I think that limit in both the old and new code is 1, except that the >> new code does it more efficiently. > >> Am I confused? > > Yes, or else I am. Consider a 4-byte UTF8 character at the end of the > string. The existing code increments the last byte up to 255 (rejecting > everything past 0xBF), then gives up and truncates that character away. > So the maximum number of tries for that character position is between 0 > and 127 depending on what the original character was (with at most 63 of > the incremented values getting past the verifymbstr test). > > The proposed patch is going to iterate through all Unicode code points > up to U+7FFFFF before giving up. Since it's possible that we need to > increment something further left to succeed at all, this doesn't seem > like a good plan. I think you're misreading the code. It does this: while (len > 0) { boring stuff; if (charincfunc(lastchar, charlen)) { more boring stuff; if (we made a greater string) return it; cleanup; } truncate away last character; } I don't see how that's ever going to try more than one character in the same position. What may be confusing you is that the old code has two loops: an outer loop that tests whether we've made a greater string, and an inner loop that tests whether we've made a validly encoded string at all. In the new code, at least in the UTF-8 case, the inner loop is GONE altogether. Instead of iterating until we construct a valid character, we just use our mad UTF-8 skillz to assemble one, and return it. Or else I need to go drink a few cups of tea and look at this again. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: