Re: CopyReadLineText optimization
От | Heikki Linnakangas |
---|---|
Тема | Re: CopyReadLineText optimization |
Дата | |
Msg-id | 47D03D5B.4010309@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: CopyReadLineText optimization (Andrew Dunstan <andrew@dunslane.net>) |
Ответы |
Re: CopyReadLineText optimization
Re: CopyReadLineText optimization |
Список | pgsql-patches |
Andrew Dunstan wrote: > Heikki Linnakangas wrote: >> Another update attached: It occurred to me that the memchr approach is >> only safe for server encodings, where the non-first bytes of a >> multi-byte character always have the hi-bit set. >> > > We currently make the following assumption in the code: > > * These four characters, and the CSV escape and quote characters, are > * assumed the same in frontend and backend encodings. > * > > The four characters are the carriage return, line feed, backslash and dot. > > I think the requirement might well actually be somewhat stronger than > that: i.e. that none of these will appear as a non-first byte in any > multi-byte client encoding character. If that's right, then we should be > able to write CopyReadLineText without bothering about multi-byte chars. > If it's not right then I suspect we have some cases that can fail now > anyway. No, we don't require that, and we do handle it correctly. We use pg_encoding_mblen to determine the length of each character in CopyReadLineText when the encoding is a client-only encoding, and only look at the first byte of each character. In CopyReadAttributesText, where we have a similar loop, we've already transformed the input to server encoding. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-patches по дате отправления: