Обсуждение: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?
[NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?
От
KARIN SUSANNE HILBERT
Дата:
I have a Windows DB2 database that I need to migrate to PostgreSQL 9.4 on RedHat Linux.
The database cluster was initialized with locale "en_US.UTF-8".
For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'.
Is it possible to do this on my current cluster?
My research seems to indicatate that WIN1252 is a subset of UTF8.
But I can't find anything to tell me what to put for the LC_COLLATE or LC_CTYPE options.
I tried the following;
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' CONNECTION LIMIT -1;
and got this error:
ERROR: encoding "WIN1252" does not match locale "en_US.UTF-8"
DETAIL: The chosen LC_CTYPE setting requires encoding "UTF8".
I then tried:
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' LC_COLLATE='en_US.1252' CONNECTION LIMIT -1;
but got this error:
ERROR: invalid locale name: "en_US.1252"
And tried:
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' LC_COLLATE='en_US.WIN1252' CONNECTION LIMIT -1;
and got this:
ERROR: invalid locale name: "en_US.WIN1252"
Any help would be greatly appreciated.
Thanks,
Karin Hilbert
The database cluster was initialized with locale "en_US.UTF-8".
For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'.
Is it possible to do this on my current cluster?
My research seems to indicatate that WIN1252 is a subset of UTF8.
But I can't find anything to tell me what to put for the LC_COLLATE or LC_CTYPE options.
I tried the following;
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' CONNECTION LIMIT -1;
and got this error:
ERROR: encoding "WIN1252" does not match locale "en_US.UTF-8"
DETAIL: The chosen LC_CTYPE setting requires encoding "UTF8".
I then tried:
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' LC_COLLATE='en_US.1252' CONNECTION LIMIT -1;
but got this error:
ERROR: invalid locale name: "en_US.1252"
And tried:
CREATE DATABASE :dbName WITH OWNER :dbOwner ENCODING 'WIN1252' LC_COLLATE='en_US.WIN1252' CONNECTION LIMIT -1;
and got this:
ERROR: invalid locale name: "en_US.WIN1252"
Any help would be greatly appreciated.
Thanks,
Karin Hilbert
KARIN SUSANNE HILBERT <ioh1@psu.edu> writes: > I have a Windows DB2 database that I need to migrate to PostgreSQL 9.4 on RedHat Linux. > The database cluster was initialized with locale "en_US.UTF-8". It's possible you need to rethink that choice of initial locale, but not necessarily. > For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'. Hm, do you really need the DB encoding to be that? If so, why? You could run the database in UTF8 encoding and set the client_encoding to WIN1252 for clients that only speak that encoding. This would result in on-the-fly encoding conversion while talking to such clients, but I don't believe the cost of that is intolerable. An advantage is that you could also cater to clients that want less Windows-centric encodings. But the really core decision here, which you haven't given enough info to know which way you need to jump on, is what locale behavior you want (eg, what sorting order). I do not think that there are any WIN1252-aware locales on Red Hat (or probably any other flavor of Linux), which means that if you want to run the DB in WIN1252 server_encoding, the only locale choice that will work is "C". Maybe that's fine for your purposes, in which case you could do it like this: CREATE DATABASE ... ENCODING 'WIN1252' LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0; (You need the "TEMPLATE template0" incantation if you're selecting encoding or collate or ctype different from the installation defaults. This is fine for a one-off, but if you foresee doing it often enough to get annoying, you could re-initdb and change the installation defaults.) Alternatively, maybe you want the behaviors of en_US locale, such as dictionary sort order. In that case you should just do "CREATE DATABASE" with no special options, and instead look to setting client_encoding appropriately for the clients that want to speak WIN1252. For more info try reading https://www.postgresql.org/docs/current/static/charset.html regards, tom lane
Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' onCluster Initialized with locale "en_US.UTF-8"?
От
KARIN SUSANNE HILBERT
Дата:
Tom, The original DB2 database table had data in it with special characters, for example: "Sørensen" or "Håkan" When I tried to load the data into the original PostgreSQL database that I had created with the default UTF-8 locale, I got errors like this: ERROR: invalid byte sequence for encoding "UTF8": 0xf8 CONTEXT: COPY paymentrequestaudit, line 1383 ERROR: invalid byte sequence for encoding "UTF8": 0xe5 0x6b 0x61 CONTEXT: COPY paymentrequestaudit, line 1437 I was able to resolve those error by adding the option "encoding 'windows-1251'" to the COPY FROM statement. But then when I checked the data, I was getting data that looked like this: "Sшrensen" or "Hеkan" After I recreated the database with the command you suggested below, I was able to load the data with the original COPY FROM command with no errors, and the data now displays correctly. Thank you for your help. Regards, Karin ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "KARIN SUSANNE HILBERT" <ioh1@psu.edu> Cc: pgsql-novice@postgresql.org Sent: Wednesday, February 22, 2017 8:52:09 PM Subject: Re: [NOVICE] Create PostgreSQL Database with ENCODING 'WIN1252' on Cluster Initialized with locale "en_US.UTF-8"? KARIN SUSANNE HILBERT <ioh1@psu.edu> writes: > I have a Windows DB2 database that I need to migrate to PostgreSQL 9.4 on RedHat Linux. > The database cluster was initialized with locale "en_US.UTF-8". It's possible you need to rethink that choice of initial locale, but not necessarily. > For my migrated database, I need to create a PostgreSQL database with ENCODING 'WIN1252'. Hm, do you really need the DB encoding to be that? If so, why? You could run the database in UTF8 encoding and set the client_encoding to WIN1252 for clients that only speak that encoding. This would result in on-the-fly encoding conversion while talking to such clients, but I don't believe the cost of that is intolerable. An advantage is that you could also cater to clients that want less Windows-centric encodings. But the really core decision here, which you haven't given enough info to know which way you need to jump on, is what locale behavior you want (eg, what sorting order). I do not think that there are any WIN1252-aware locales on Red Hat (or probably any other flavor of Linux), which means that if you want to run the DB in WIN1252 server_encoding, the only locale choice that will work is "C". Maybe that's fine for your purposes, in which case you could do it like this: CREATE DATABASE ... ENCODING 'WIN1252' LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0; (You need the "TEMPLATE template0" incantation if you're selecting encoding or collate or ctype different from the installation defaults. This is fine for a one-off, but if you foresee doing it often enough to get annoying, you could re-initdb and change the installation defaults.) Alternatively, maybe you want the behaviors of en_US locale, such as dictionary sort order. In that case you should just do "CREATE DATABASE" with no special options, and instead look to setting client_encoding appropriately for the clients that want to speak WIN1252. For more info try reading https://www.postgresql.org/docs/current/static/charset.html regards, tom lane