Re: CopyReadLineText optimization
От | Andrew Dunstan |
---|---|
Тема | Re: CopyReadLineText optimization |
Дата | |
Msg-id | 47D03BCE.9030909@dunslane.net обсуждение исходный текст |
Ответ на | Re: CopyReadLineText optimization ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Ответы |
Re: CopyReadLineText optimization
|
Список | pgsql-patches |
Heikki Linnakangas wrote: > Heikki Linnakangas wrote: >> Heikki Linnakangas wrote: >>> Attached is a patch that modifies CopyReadLineText so that it uses >>> memchr to speed up the scan. The nice thing about memchr is that we >>> can take advantage of any clever optimizations that might be in libc >>> or compiler. >> >> Here's an updated version of the patch. The principle is the same, >> but the same optimization is now used for CSV input as well, and >> there's more comments. > > Another update attached: It occurred to me that the memchr approach is > only safe for server encodings, where the non-first bytes of a > multi-byte character always have the hi-bit set. > We currently make the following assumption in the code: * These four characters, and the CSV escape and quote characters, are * assumed the same in frontend and backend encodings. * The four characters are the carriage return, line feed, backslash and dot. I think the requirement might well actually be somewhat stronger than that: i.e. that none of these will appear as a non-first byte in any multi-byte client encoding character. If that's right, then we should be able to write CopyReadLineText without bothering about multi-byte chars. If it's not right then I suspect we have some cases that can fail now anyway. (I believe some client encodings at least use backslash in subsequent chars, and that's a nasty one because the "\." end sequence is hard coded). I believe all the chars up to 0x2f are safe - that includes both quote chars and dot) cheers andrew
В списке pgsql-patches по дате отправления: