Обсуждение: Choosing character set for database

Поиск
Список
Период
Сортировка

Choosing character set for database

От
Grzegorz Szpetkowski
Дата:
Is there any clear performance difference of using multi-byte
character set (such as UTF-8) and single-byte (e.g. SQL_ASCII,
LATIN2). Why there is no UTF-32 (generally more space for chars, but
faster to calculate than multibyte ?) ? I found only at Oracle 11g
documentation that:

"For best performance, choose a character set that avoids character
set conversion and uses the most efficient encoding for the languages
desired. Single-byte character sets result in better performance than
multibyte character sets, and they also are the most efficient in
terms of space requirements. However, single-byte character sets limit
how many languages you can support."

Regards,
Grzegorz Sz.

Re: Choosing character set for database

От
Susanne Ebrecht
Дата:
On 20.04.2011 22:29, Grzegorz Szpetkowski wrote:
> Is there any clear performance difference of using multi-byte
> character set (such as UTF-8) and single-byte (e.g. SQL_ASCII,
> LATIN2). Why there is no UTF-32 (generally more space for chars, but
> faster to calculate than multibyte ?) ? I found only at Oracle 11g
> documentation that:

Hello Grzegorz,

PostgreSQL didn't implement own character sets.
We just use what libc provide. Means what you find on your OS.

My information is there is no operating system using UTF-32.

Did you ever feel a performance difference on your OS when you used ISO
instead
of UTF8?

Regards,

Susanne

--
Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com


Re: Choosing character set for database

От
Lew
Дата:
Susanne Ebrecht wrote:
> Grzegorz Szpetkowski wrote:
>> Is there any clear performance difference of using multi-byte
>> character set (such as UTF-8) and single-byte (e.g. SQL_ASCII,
>> LATIN2). Why there is no UTF-32 (generally more space for chars, but
>> faster to calculate than multibyte ?) ? ...

> PostgreSQL didn't implement own character sets.
> We just use what libc provide. Means what you find on your OS.
>
> My information is there is no operating system using UTF-32.
>
> Did you ever feel a performance difference on your OS when you used ISO instead
> of UTF8?

For locales where ISO-8859-x makes sense, UTF-8 is a mostly single-byte
encoding its own self.

Encoding occurs on the I/O path, so one would expect I/O effects to swamp
differences due to encoding.

Do you have any evidence whatsoever that encoding matters to performance, as
Susanne's question directs, or are you just letting your imagination run away
with you?

To answer your question literally, of course, yes, there will be a performance
difference between different character encodings.  Whether that difference is
positive or negative, noticeable or not, is another matter that no one else
can answer for you.  It depends on factors local to your particular
environment, including architecture, load, bandwidth, etc.

My guess, and guesses without evidence are as reliable as the newspaper
astrology column for this purpose, is that the differences depend on what your
OS natively supports - that is, if your OS is set up for UTF-8, then a LATIN-1
encoding might be slower, and vice versa - and that you will barely if at all
be able to detect them.

Knuth warned us, "Premature optimization is the root of all evil."  Why don't
you just focus on a clean, maintainable data structure and good application
design?

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg