Re: Bug in UTF8-Validation Code?
От | Andrew Dunstan |
---|---|
Тема | Re: Bug in UTF8-Validation Code? |
Дата | |
Msg-id | 45FCBA2E.7010303@dunslane.net обсуждение исходный текст |
Ответ на | Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Bug in UTF8-Validation Code?
|
Список | pgsql-hackers |
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Here are some timing tests in 1m rows of random utf8 encoded 100 char >> data. It doesn't look to me like the saving you're suggesting is worth >> the trouble. >> > > Hmm ... not sure I believe your numbers. Using a test file of 1m lines > of 100 random latin1 characters converted to utf8 (thus, about half and > half 7-bit ASCII and 2-byte utf8 characters), I get this in SQL_ASCII > encoding: > > regression=# \timing > Timing is on. > regression=# create temp table test(f1 text); > CREATE TABLE > Time: 5.047 ms > regression=# copy test from '/home/tgl/zzz1m'; > COPY 1000000 > Time: 4337.089 ms > > and this in UTF8 encoding: > > utf8=# \timing > Timing is on. > utf8=# create temp table test(f1 text); > CREATE TABLE > Time: 5.108 ms > utf8=# copy test from '/home/tgl/zzz1m'; > COPY 1000000 > Time: 7776.583 ms > > The numbers aren't super repeatable, but it sure looks to me like the > encoding check adds at least 50% to the runtime in this example; so > doing it twice seems unpleasant. > > (This is CVS HEAD, compiled without assert checking, on an x86_64 > Fedora Core 6 box.) > > > Here are some test results that are closer to yours. I used a temp table and had cassert off and fsync off, and tried with several encodings. The additional load from the test isn't 50%, (I think you have added the cost of going from ascii to utf8 to the cost of the test to get that 50%) but it is nevertheless appreciable. I agree that we should look at not testing if the client and server encodings are the same, so we can reduce the difference. cheers andrew Run SQL_ASCII LATIN1 UTF8 1 4659.38 4766.07 9134.53 2 7999.64 4003.13 6231.41 3 4178.46 6178.89 7266.39 Without test 4 4201.7 3930.84 10154.38 5 4092.44 4444.52 9438.24 6 3977.34 4197.09 8866.56 Average 4851.49 4586.76 8515.25 1 11993.86 12625.8 10109.89 2 4647.16 9192.53 11251.27 With test 3 4211.02 9903.77 10097.37 4 9203.62 7045.06 10372.25 5 4121.39 4138.78 10386.92 6 3722.73 4552.09 7432.56 Average 6316.63 7909.67 9941.71
В списке pgsql-hackers по дате отправления: