Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5" |
Дата | |
Msg-id | 093b07ad-280c-4741-4519-f8db72420ffb@iki.fi обсуждение исходный текст |
Ответ на | BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5" (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
Sounds a lot like a bug in commit f82de5c46bdf8cd65812a7b04c9509c218e1545d. Thanks for the report, I'll investigate! - Heikki On 28/05/2022 23:57, Vitaly V. Voronov wrote: > Hello, > Right commands: > # Imported without errors > for i in $(seq 1 207); do echo > "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工 > 事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いた > します。※詳細はこちら【工事について】https://www.test.jp/1234 > /5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>" >> > /tmp/test_pass.csv; done; > # Imported with errors > for i in $(seq 1 5722); do echo "NURO光です。明日の宅内工事お立合いよろ > しくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準 > 備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】 > https://www.test.jp/1234/5678.html&id=12211" >> /tmp/test_fail.csv; done; > > > 28.05.2022, 23:53, "PG Bug reporting form" <noreply@postgresql.org>: > > The following bug has been logged on the website: > > Bug reference: 17501 > Logged by: Vitaly Voronov > Email address: wizard_1024@tut.by <mailto:wizard_1024@tut.by> > PostgreSQL version: 14.3 > Operating system: CentOS Linux release 7.9.2009 (Core) > Description: > > Hello, > > We've seen a such bug: COPY command shows error "ERROR: invalid byte > sequence for encoding "UTF8": 0xe5" on file. > The same file with small amount of lines is imported without any errors. > > How to reproduce bug: > # create database > # create database with > # SQL_ASCII, C, C > createdb --encoding=SQL_ASCII --lc-collate=C --lc-ctype=C > --template=template0 test > > # connect to the database > psql test > > # Create table > CREATE TABLE test_data ( > test_data text > ); > > # Import without error > truncate table test_data; > COPY test_data (test_data) FROM '/tmp/test_pass.csv' WITH DELIMITER > AS ',' > CSV QUOTE AS '"'; > > COPY 207 > > # Import with error > truncate table test_data; > COPY test_data (test_data) FROM '/tmp/test_fail.csv' WITH DELIMITER > AS ',' > CSV QUOTE AS '"'; > > ERROR: invalid byte sequence for encoding "UTF8": 0xe5 > CONTEXT: COPY test_data, line 627 > > # both files contains the same rows, but test_fail contains more rows > # seems that the file more than 65K size cannot be imported > # if create DB with UTF8 encoding instead of SQL_ASCII - both tests > will be > passed > > # How to generate files: > # Imported without errors > for i in $(seq 1 207); do echo > "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋 > 外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご > 連絡いたします。※詳細はこちら【工事について】https://www.test.jp > /1234/5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>" > > /tmp/test_pass.csv; done; > > # Imported with errors > for i in $(seq 1 5722); do echo > "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋 > 外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご > 連絡いたします。※詳細はこちら【工事について】https://www.test.jp > /1234/5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>" > > /tmp/test_fail.csv; done; > > > # Both files can be imported without any problem to PostgreSQL 11. >
В списке pgsql-bugs по дате отправления: