Re: insensitive collations
От | Andreas Karlsson |
---|---|
Тема | Re: insensitive collations |
Дата | |
Msg-id | da340c4b-24d0-8ff1-54d2-5b023eb81e3c@proxel.se обсуждение исходный текст |
Ответ на | Re: insensitive collations (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Ответы |
Re: insensitive collations
Re: insensitive collations |
Список | pgsql-hackers |
On 1/10/19 8:44 AM, Peter Eisentraut wrote: > On 09/01/2019 19:49, Andreas Karlsson wrote: >> Maybe this is orthogonal and best handled elsewhere but have you when >> working with string equality given unicode normalization forms[1] any >> thought? > > Nondeterministic collations do address this by allowing canonically > equivalent code point sequences to compare as equal. You still need a > collation implementation that actually does compare them as equal; ICU > does this, glibc does not AFAICT. Ah, right! You could use -ks-identic[1] for this. >> Would there be any point in adding unicode normalization support into >> the collation system or is this best handle for example with a function >> run on INSERT or with something else entirely? > > I think there might be value in a feature that normalizes strings as > they enter the database, as a component of the encoding conversion > infrastructure. But that would be a separate feature. Agreed. And if we ever implement this we could theoretically optimize the equality of -ks-identic to do a strcmp() rather than having to collate anything. I think it could also be useful to just add functions which can normalize strings, which was in a proposal to the SQL standard which was not accepted.[2] Notes 1. http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options 2. https://dev.mysql.com/worklog/task/?id=2048 Andreas
В списке pgsql-hackers по дате отправления: