Re: insensitive collations

Поиск

Список

Период

Сортировка

От	Andreas Karlsson
Тема	Re: insensitive collations
Дата	14 января 2019 г. 14:37:20
Msg-id	da340c4b-24d0-8ff1-54d2-5b023eb81e3c@proxel.se обсуждение исходный текст
Ответ на	Re: insensitive collations (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы	Re: insensitive collations Re: insensitive collations
Список	pgsql-hackers

Дерево обсуждения

On 1/10/19 8:44 AM, Peter Eisentraut wrote:
> On 09/01/2019 19:49, Andreas Karlsson wrote:
>> Maybe this is orthogonal and best handled elsewhere but have you when
>> working with string equality given unicode normalization forms[1] any
>> thought?
> 
> Nondeterministic collations do address this by allowing canonically
> equivalent code point sequences to compare as equal.  You still need a
> collation implementation that actually does compare them as equal; ICU
> does this, glibc does not AFAICT.

Ah, right! You could use -ks-identic[1] for this.

>> Would there be any point in adding unicode normalization support into
>> the collation system or is this best handle for example with a function
>> run on INSERT or with something else entirely?
> 
> I think there might be value in a feature that normalizes strings as
> they enter the database, as a component of the encoding conversion
> infrastructure.  But that would be a separate feature.

Agreed. And if we ever implement this we could theoretically optimize 
the equality of -ks-identic to do a strcmp() rather than having to 
collate anything.

I think it could also be useful to just add functions which can 
normalize strings, which was in a proposal to the SQL standard which was 
not accepted.[2]

Notes

1. http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
2. https://dev.mysql.com/worklog/task/?id=2048

Andreas

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: insensitive collations