Re: Bug in UTF8-Validation Code?
От | Andrew Dunstan |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 45FC4F85.7090804@dunslane.net обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Bug in UTF8-Validation Code?
|
Список | pgsql-hackers |
Tom Lane wrote: > I wrote: > >> Actually, I have to take back that objection: on closer look, COPY >> validates the data only once and does so before applying its own >> backslash-escaping rules. So there is a risk in that path too. >> > > >> It's still pretty annoying to be validating the data twice in the >> common case where no backslash reduction occurred, but I'm not sure >> I see any good way to avoid it. >> > > Further thought here: if we put encoding verification into textin() > and related functions, could we *remove* it from COPY IN, in the common > case where client and server encodings are the same? Currently, copy.c > forces a trip through pg_client_to_server for multibyte encodings > even when the encodings are the same, so as to perform validation. > But I'm wondering whether we'd still need that. There's no risk of > SQL injection in COPY data. Bogus input encoding could possibly > make for confusion about where the field boundaries are, but bad > data is bad data in any case. > > regards, tom lane > > Here are some timing tests in 1m rows of random utf8 encoded 100 char data. It doesn't look to me like the saving you're suggesting is worth the trouble. baseline: Time: 28228.325 ms Time: 25987.740 ms Time: 25950.707 ms Time: 25756.371 ms Time: 27589.719 ms Time: 25774.417 ms after adding suggested extra test to textin(): Time: 26722.376 ms Time: 28343.226 ms Time: 26529.364 ms Time: 28020.140 ms Time: 24836.853 ms Time: 24860.530 ms Script is: \timing create table xyz (x text); copy xyz from '/tmp/utf8.data'; truncate xyz; copy xyz from '/tmp/utf8.data'; truncate xyz; copy xyz from '/tmp/utf8.data'; truncate xyz; copy xyz from '/tmp/utf8.data'; truncate xyz; copy xyz from '/tmp/utf8.data'; truncate xyz; copy xyz from '/tmp/utf8.data'; drop table xyz; Test platform: FC6, Athlon64. cheers andrew
В списке pgsql-hackers по дате отправления: