Undocumented feature costs a lot of performance in COPY IN
От | Tom Lane |
---|---|
Тема | Undocumented feature costs a lot of performance in COPY IN |
Дата | |
Msg-id | 2841.1007495345@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Undocumented feature costs a lot of performance in COPY
Re: Undocumented feature costs a lot of performance in |
Список | pgsql-hackers |
I have been fooling around profiling various ways of inserting wide (8000-byte, not all that wide) bytea fields, per Brent Verner's note of a few days ago. COPY IN should be, and is, the fastest way to do it. But I was rather startled to discover that 25% of the runtime of COPY IN went to an inefficient way of fetching single bytes from pqcomm.c (pq_getbytes(&ch, 1) instead of ch = pq_getbyte()), and 20% of what's left after fixing that is going into the strchr() call in CopyReadAttribute. Now the point of that strchr() call is to detect whether the current character is the column delimiter. The COPY reference page clearly says: By default, a text copy uses a tab ("\t") character as adelimiter between fields. The field delimiter may be changed toanyother single character with the keyword phrase USINGDELIMITERS. Characters in data fields which happen to match thedelimitercharacter will be backslash quoted. Note that thedelimiter is always a single character. If multiple charactersarespecified in the delimiter string, only the first characteris used. and indeed, only the first character is used by COPY OUT. But COPY IN is presently coded so that if multiple characters are mentioned in USING DELIMITERS, any one of them will be taken as a field delimiter. I would like to change the code to just "if (c == delim[0])", which should buy back most of that 20% and make the behavior match the documentation. Question for the list: is this a bad change? Is anyone out there actually using this undocumented behavior? regards, tom lane
В списке pgsql-hackers по дате отправления: