Обсуждение: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?

Поиск
Список
Период
Сортировка

[NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?

От
KARIN SUSANNE HILBERT
Дата:
I have a Windows DB2 database that I need to migrate to PostgreSQL 9.4 on RedHat Linux.
The database cluster was initialized with locale "en_US.UTF-8".

For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'.
Is it possible to do this on my current cluster?
My research seems to indicatate that WIN1252 is a subset of UTF8.
But I can't find anything to tell me what to put for the LC_COLLATE or LC_CTYPE options.

I tried the following;
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' CONNECTION LIMIT -1;
and got this error:
ERROR: encoding "WIN1252" does not match locale "en_US.UTF-8"
DETAIL: The chosen LC_CTYPE setting requires encoding "UTF8".

I then tried:
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' LC_COLLATE='en_US.1252' CONNECTION LIMIT -1;
but got this error:
ERROR: invalid locale name: "en_US.1252"

And tried:
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' LC_COLLATE='en_US.WIN1252' CONNECTION LIMIT -1;
and got this:
ERROR: invalid locale name: "en_US.WIN1252"

Any help would be greatly appreciated.
Thanks,
Karin Hilbert
KARIN SUSANNE HILBERT <ioh1@psu.edu> writes:
> I have a Windows DB2 database that I need to migrate to PostgreSQL 9.4 on RedHat Linux.
> The database cluster was initialized with locale "en_US.UTF-8".

It's possible you need to rethink that choice of initial locale, but not
necessarily.

> For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'.

Hm, do you really need the DB encoding to be that?  If so, why?  You could
run the database in UTF8 encoding and set the client_encoding to WIN1252
for clients that only speak that encoding.  This would result in
on-the-fly encoding conversion while talking to such clients, but I don't
believe the cost of that is intolerable.  An advantage is that you could
also cater to clients that want less Windows-centric encodings.

But the really core decision here, which you haven't given enough info to
know which way you need to jump on, is what locale behavior you want (eg,
what sorting order).  I do not think that there are any WIN1252-aware
locales on Red Hat (or probably any other flavor of Linux), which means
that if you want to run the DB in WIN1252 server_encoding, the only locale
choice that will work is "C".  Maybe that's fine for your purposes, in
which case you could do it like this:

CREATE DATABASE ... ENCODING 'WIN1252' LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0;

(You need the "TEMPLATE template0" incantation if you're selecting
encoding or collate or ctype different from the installation defaults.
This is fine for a one-off, but if you foresee doing it often enough to
get annoying, you could re-initdb and change the installation defaults.)

Alternatively, maybe you want the behaviors of en_US locale, such as
dictionary sort order.  In that case you should just do "CREATE DATABASE"
with no special options, and instead look to setting client_encoding
appropriately for the clients that want to speak WIN1252.

For more info try reading

https://www.postgresql.org/docs/current/static/charset.html

            regards, tom lane


Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?

От
KARIN SUSANNE HILBERT
Дата:
Tom,

The original DB2 database table had data in it with special characters, for example:  "Sørensen" or "Håkan"
When I tried to load the data into the original PostgreSQL database that I had created with the
default UTF-8 locale, I got errors like this:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf8
CONTEXT:  COPY paymentrequestaudit, line 1383

ERROR:  invalid byte sequence for encoding "UTF8": 0xe5 0x6b 0x61
CONTEXT:  COPY paymentrequestaudit, line 1437

I was able to resolve those error by adding the option "encoding 'windows-1251'" to the COPY FROM statement.
But then when I checked the data, I was getting data that looked like this:  "Sшrensen" or "Hеkan"

After I recreated the database with the command you suggested below,
I was able to load the data with the original COPY FROM command with no errors,
and the data now displays correctly.

Thank you for your help.
Regards,
Karin

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "KARIN SUSANNE HILBERT" <ioh1@psu.edu>
Cc: pgsql-novice@postgresql.org
Sent: Wednesday, February 22, 2017 8:52:09 PM
Subject: Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' on Cluster Initialized with locale
"en_US.UTF-8"?

KARIN SUSANNE HILBERT <ioh1@psu.edu> writes:
> I have a Windows DB2 database that I need to migrate to PostgreSQL 9.4 on RedHat Linux.
> The database cluster was initialized with locale "en_US.UTF-8".

It's possible you need to rethink that choice of initial locale, but not
necessarily.

> For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'.

Hm, do you really need the DB encoding to be that?  If so, why?  You could
run the database in UTF8 encoding and set the client_encoding to WIN1252
for clients that only speak that encoding.  This would result in
on-the-fly encoding conversion while talking to such clients, but I don't
believe the cost of that is intolerable.  An advantage is that you could
also cater to clients that want less Windows-centric encodings.

But the really core decision here, which you haven't given enough info to
know which way you need to jump on, is what locale behavior you want (eg,
what sorting order).  I do not think that there are any WIN1252-aware
locales on Red Hat (or probably any other flavor of Linux), which means
that if you want to run the DB in WIN1252 server_encoding, the only locale
choice that will work is "C".  Maybe that's fine for your purposes, in
which case you could do it like this:

CREATE DATABASE ... ENCODING 'WIN1252' LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0;

(You need the "TEMPLATE template0" incantation if you're selecting
encoding or collate or ctype different from the installation defaults.
This is fine for a one-off, but if you foresee doing it often enough to
get annoying, you could re-initdb and change the installation defaults.)

Alternatively, maybe you want the behaviors of en_US locale, such as
dictionary sort order.  In that case you should just do "CREATE DATABASE"
with no special options, and instead look to setting client_encoding
appropriately for the clients that want to speak WIN1252.

For more info try reading

https://www.postgresql.org/docs/current/static/charset.html

            regards, tom lane