Re: An idea on faster CHAR field indexing
От | Giles Lean |
---|---|
Тема | Re: An idea on faster CHAR field indexing |
Дата | |
Msg-id | 11897.961636374@nemeton.com.au обсуждение исходный текст |
Ответ на | Re: An idea on faster CHAR field indexing ("Randall Parker" <randall@nls.net>) |
Ответы |
Re: An idea on faster CHAR field indexing
|
Список | pgsql-hackers |
> I'm curious as to why the need for multiple passes. Is that true > even in Latin 1 code pages? Yes. Some locales want strings to be ordered first by ignoring any accents on chracters, then using a tie-break on equal strings by doing a comparison that includes the accents. To take another of your points out of order: this is an obstacle that Unicode doesn't resolve. Unicode gives you a character set capable of representing characters from many different locales, but collation order will remain locale specific. > If not, this optimization could at least > be used for code pages that don't require multiple passes. ... but due to the increased memory/disk space, this is likely not an optimisation. Measurements needed, I'd suggest. My only experience of this was tuning a sort utility, where the extra time to convert the strings with strxfrm() and the large additional memory requirement killed any advantage strcmp() had over strcoll(). Whether this would be the case for database indexes in general or ideed ever I don't know. > As for memory usage: I don't see the issue here. The translation to > some collation sequence has to be done anyhow. No; you can do the comparisons in multiple passes instead without extra storage allocation. Using multiple passes will be efficient if the comparisons mostly don't need the second pass, which I suspect is typical. > Writing one's own routine to do look-ups into a collation sequence > table is a fairly trivial exercise. True. But if you can't do character-by-character comparisons then such a simplistic implementation will fail. I hadn't mentioned this time around (but see the archives for the recent discussion of LIKE) that there are locales with 2:1 and 1:2 mappings of characters too. Regards, Giles
В списке pgsql-hackers по дате отправления: