Re: CopyReadLineText optimization revisited
От | Tom Lane |
---|---|
Тема | Re: CopyReadLineText optimization revisited |
Дата | |
Msg-id | 26738.1282850166@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | CopyReadLineText optimization revisited (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Ответы |
Re: CopyReadLineText optimization revisited
|
Список | pgsql-hackers |
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > * we perform possible encoding conversion early, one input block at a > time, rather than after splitting the input into lines. This allows us > to assume in the later stages that the data is in server encoding, > allowing us to search for the '\n' byte without worrying about > multi-byte characters. Seems reasonable, although the need to deal with multibyte characters crossing a block boundary injects some ugliness that wasn't there before. > * instead of the byte-at-a-time loop in CopyReadLineText(), use memchr() > to find the next NL/CR character. This is where the speedup comes from. That seems like the speedup, if any, would be extremely platform-dependent. What have you tested on? > There's a small fly in the ointment: the patch won't recognize backslash > followed by a linefeed as an escaped linefeed. I think we should simply > drop support for that. I think this is likely to break apps that have worked for years. I can't get excited about doing that in return for an "0-10%" speedup that might only materialize on some platforms. If the numbers were better, it'd be worth paying that price, but ... > It's not strictly necessary, but how about dropping support for the old > COPY protocol, and the EOF marker \. while we're at it? It would allow > us to drop some code, making the remaining code simpler, and reduce the > testing effort. Thoughts on that? Again, I think the threshold requirement for breaking compatibility needs to be a lot higher than this. regards, tom lane
В списке pgsql-hackers по дате отправления: