Обсуждение: Issues with german 'Umlaute'

Поиск
Список
Период
Сортировка

Issues with german 'Umlaute'

От
Nicolaus Erichsen
Дата:
Hello everybody,

I recently found a problem with sorting german 'Umlaute' . I hope the encoding
of this mail works ;-)  :

Postgres puts Umlaute (i.e., ÄäÖöÜü)  at the very end of the Alphabet, and
this is not the way it should be.  I didn't check for the special Character
'ß', but its probably similar.

The canonical sort order for Umlaute is to treat them as two characters, like
this:
ä -> ae
ö -> oe
ü -> ue
ß -> ss
( and the same for upper case 'ÄÖÜ'. 'ß' does not have an upper case )

Well, I guess this might be difficult to implement and might have quite an
impact on performance. The solution I know from other databases consists of
inserting ä after a, ö after o, ü after u and ß after s. Afaik this is
generally accepted.

upper() does not handle Umlaute correctly as well. It leaves äöü unchanged
instead of converting them to upper case.

All this happens with a database  created with encoding ='latin1'. If there
are better results with a different encoding (I didn't try it yet), I'd
suggest adding some information about this in the documentation.

Thanks for your work,

N.Erichsen

--
HSH Soft-und Hardware Vertriebs GmbH
Rudolf-Diesel-Straße 2 - 16321 Lindenberg
Tel. (030) 94004 - 509  Fax (030) 94004 - 400

Re: Issues with german 'Umlaute'

От
Tom Lane
Дата:
Nicolaus Erichsen <nico.erichsen@hsh-berlin.com> writes:
> I recently found a problem with sorting german 'Umlaute' .

Sounds like you did not set the right locale when creating the database.
You need to be careful to run initdb with LANG (or LC_ALL or at least
LC_COLLATE) set to what you want, probably "de_DE".

> All this happens with a database  created with encoding ='latin1'.

Encoding is not the issue, locale is.

            regards, tom lane

Re: Issues with german 'Umlaute'

От
"Iavor Raytchev"
Дата:
Tom Lane wrote:

>
> Nicolaus Erichsen <nico.erichsen@hsh-berlin.com> writes:
> > I recently found a problem with sorting german 'Umlaute' .
>
> Sounds like you did not set the right locale when creating
> the database.
> You need to be careful to run initdb with LANG (or LC_ALL or at least
> LC_COLLATE) set to what you want, probably "de_DE".
>
> > All this happens with a database  created with encoding ='latin1'.
>
> Encoding is not the issue, locale is.

Then what about having German, English, Italian and French words in the
same database? Shall we create four databases and place each language in
a separate one?

Iavor

--
Iavor Raytchev
very small technologies (a company of CEE Solutions)

www.verysmall.org