Re: Collation rules and multi-lingual databases
От | Greg Stark |
---|---|
Тема | Re: Collation rules and multi-lingual databases |
Дата | |
Msg-id | 87ekzditpk.fsf@stark.dyndns.tv обсуждение исходный текст |
Ответ на | Re: Collation rules and multi-lingual databases (Greg Stark <gsstark@mit.edu>) |
Ответы |
Re: Collation rules and multi-lingual databases
|
Список | pgsql-general |
Greg Stark <gsstark@MIT.EDU> writes: > Dennis Gearon <gearond@fireserve.net> writes: > > > I think it would be nice, and I may write it eventually, to have a function > > called: > > > > COLLATION_VALUE( 'string', 'encoding' ) > > Indeed that would be really nice. I wish I had that and a pony. > > Unfortunately my understanding is that the collation rules are simply too > complex to allow such a function in general. It's too bad because it would > indeed eliminate a lot of the problems in a single swoop. Uh, so apparently I'm on crack and this is *precisely* how the l10n collation rules work. Sorry for jumping in with an uninformed opinion. > Effectively, the way these functions work is by applying a mapping to > transform the characters in a string to a byte sequence that represents > the string's position in the collating sequence of the current locale. > Comparing two such byte sequences in a simple fashion is equivalent to > comparing the strings with the locale's collating sequence. > > The functions `strcoll' and `wcscoll' perform this translation > implicitly, in order to do one comparison. By contrast, `strxfrm' and > `wcsxfrm' perform the mapping explicitly. If you are making multiple > comparisons using the same string or set of strings, it is likely to be > more efficient to use `strxfrm' or `wcsxfrm' to transform all the > strings just once, and subsequently compare the transformed strings > with `strcmp' or `wcscmp'. Given this it should be easy to write a collation_value(string,locale) C function that switches the collation order, calls strxfrm and then restores the collation order. I fear memory leaks or performance losses on frequent locale switches like this but it should be easy enough to try out. I don't see any problems with postgres as long as it's possible to ensure the locale is always switched back properly. It might not be thread-safe though. At worst I could always call strxfrm in the application for each locale I care about when inserting the data. That would bloat my tables for nothing though. So it's looking like I might get my pony after all. -- greg
В списке pgsql-general по дате отправления: