Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?

Поиск
Список
Период
Сортировка
От KARIN SUSANNE HILBERT
Тема Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?
Дата
Msg-id 1155507477.8735658.1487848350319.JavaMail.zimbra@psu.edu
обсуждение исходный текст
Ответ на Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' on Cluster Initialized with locale "en_US.UTF-8"?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-novice
Tom,

The original DB2 database table had data in it with special characters, for example:  "Sørensen" or "Håkan"
When I tried to load the data into the original PostgreSQL database that I had created with the
default UTF-8 locale, I got errors like this:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf8
CONTEXT:  COPY paymentrequestaudit, line 1383

ERROR:  invalid byte sequence for encoding "UTF8": 0xe5 0x6b 0x61
CONTEXT:  COPY paymentrequestaudit, line 1437

I was able to resolve those error by adding the option "encoding 'windows-1251'" to the COPY FROM statement.
But then when I checked the data, I was getting data that looked like this:  "Sшrensen" or "Hеkan"

After I recreated the database with the command you suggested below,
I was able to load the data with the original COPY FROM command with no errors,
and the data now displays correctly.

Thank you for your help.
Regards,
Karin

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "KARIN SUSANNE HILBERT" <ioh1@psu.edu>
Cc: pgsql-novice@postgresql.org
Sent: Wednesday, February 22, 2017 8:52:09 PM
Subject: Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' on Cluster Initialized with locale
"en_US.UTF-8"?

KARIN SUSANNE HILBERT <ioh1@psu.edu> writes:
> I have a Windows DB2 database that I need to migrate to PostgreSQL 9.4 on RedHat Linux.
> The database cluster was initialized with locale "en_US.UTF-8".

It's possible you need to rethink that choice of initial locale, but not
necessarily.

> For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'.

Hm, do you really need the DB encoding to be that?  If so, why?  You could
run the database in UTF8 encoding and set the client_encoding to WIN1252
for clients that only speak that encoding.  This would result in
on-the-fly encoding conversion while talking to such clients, but I don't
believe the cost of that is intolerable.  An advantage is that you could
also cater to clients that want less Windows-centric encodings.

But the really core decision here, which you haven't given enough info to
know which way you need to jump on, is what locale behavior you want (eg,
what sorting order).  I do not think that there are any WIN1252-aware
locales on Red Hat (or probably any other flavor of Linux), which means
that if you want to run the DB in WIN1252 server_encoding, the only locale
choice that will work is "C".  Maybe that's fine for your purposes, in
which case you could do it like this:

CREATE DATABASE ... ENCODING 'WIN1252' LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0;

(You need the "TEMPLATE template0" incantation if you're selecting
encoding or collate or ctype different from the installation defaults.
This is fine for a one-off, but if you foresee doing it often enough to
get annoying, you could re-initdb and change the installation defaults.)

Alternatively, maybe you want the behaviors of en_US locale, such as
dictionary sort order.  In that case you should just do "CREATE DATABASE"
with no special options, and instead look to setting client_encoding
appropriately for the clients that want to speak WIN1252.

For more info try reading

https://www.postgresql.org/docs/current/static/charset.html

            regards, tom lane


В списке pgsql-novice по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' on Cluster Initialized with locale "en_US.UTF-8"?
Следующее
От: neha khatri
Дата:
Сообщение: [NOVICE] Make INFO messages consistent in various index *validate() functions