Re: [BUGS] BUG #14885: mistake in sorting win1251 chars

Поиск

Список

Период

Сортировка

От	Francisco Olarte
Тема	Re: [BUGS] BUG #14885: mistake in sorting win1251 chars
Дата	3 ноября 2017 г. 10:22:25
Msg-id	CA+bJJbxbUwHy4x4MuMk=7a6VzUuv5n18sVkBcex6=i91Fo0aGg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [BUGS] BUG #14885: mistake in sorting win1251 chars (Kalin Daskalov <k.daskalov.911@gmail.com>)
Ответы	Re: [BUGS] BUG #14885: mistake in sorting win1251 chars
Список	pgsql-bugs

Дерево обсуждения

Kalin:

On Thu, Nov 2, 2017 at 6:27 PM, Kalin Daskalov <k.daskalov.911@gmail.com> wrote:
> I understand you well and this exactly is the situation.
...
> I have to admit that this is not PostgreSQL problem.

Ok then.

> In fact my previous compares are based on ASCII comparison - based on the
> order of the chars.

I doubt it was ASCII. ASCII is a 7 byte code. You were probably using
an 8 bit code partially based on ascii ( Like the ISO-8859-1 typically
used in spain, or its superset win-1252 ). What you were doing was
probably a lexicographic compare using the unsigned 8 bit value. This
is good enough to keep a table for a bsearch or build a btree, but is
not what modern collations do ( among other things they collate uper
and lower case together, like paper dictionaries normally do )

> Now I test with ANSI comparison realized with MS Windows system functions
> and the result the same as in PostgreSQL.

Also remember what you refer as Windows is probably Win NT, which has
been internally unicode since the beginning. Besides, it's been 15
years since I used it but even then the windows API had lots of ways
to do things.

> But this is not appropriate. In fact if Cyrillic alphabet these are
> different letters and in Bulgarian language no one does expect this
> behavior. It's almost like to decide that Latin letters "i" and "y" should
> have such behavior.

I'm not a Bulgarian speaker, but you should raise it to then. And the
i/y letter behaviour depends on the language, "i" is a vowel, but in
spanish it can be or not, depending on the word. It sorts between x
and z, but that has always been that way. Not knowing Bulgarian I do
not know if the two letters you used are different, like n and ñ in
spanish, or not, like a and à. If you consider it is right you could
try to document it further and try to get the collation changed, but I
would consult some references first.

Also, which is your locale? Remember collation order depends on it.

Francisco Olarte

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [BUGS] BUG #14885: mistake in sorting win1251 chars