Re: Why do indexes and sorts use the database collation?
От | Jeff Davis |
---|---|
Тема | Re: Why do indexes and sorts use the database collation? |
Дата | |
Msg-id | 13fb2032cbfda29bb4c6c55ccc41a3d01a8de717.camel@j-davis.com обсуждение исходный текст |
Ответ на | Re: Why do indexes and sorts use the database collation? (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On Mon, 2023-11-13 at 14:12 -0800, Andres Freund wrote: > Why on earth are we solving this by having multiple pg_collation > entries for > exactly the same collation, instead of normalizing the collation-name > during > lookup by adding the relevant encoding name if not explicitly > specified? It > makes a lot of sense to not force the user to specify the encoding > when it > can't differ. I'm not aware of it being a common practical problem, so perhaps lack of motivation. But you're right that it doesn't look very efficient. We can even go deeper into ICU if we wanted to: lots of locales are actually aliases to a much smaller number of actual collators. And a lot are just aliases to the root locale. It's not trivial to reliably tell if two collators are identical, but in principle it should be possible: each collation is just a set of tailorings on top of the root locale, so I suppose if those are equal it's the same collator, right? > It's imo similarly absurd that an index with "default" collation > cannot be > used when specifying the equivalent collation explicitly in the query > and vice > versa. The catalog representation is not ideal to treat the database collation consistently with other collations. It would be nice to fix that. > > > > > Jeff was saying that textual primary keys typically don't need > sorting and > because of that we could default to "C", for performance. Part of my > response > was that I think the user's intent could be expressed by specifying > the column > collation as "C" - to which Jeff replied that that would change the > semantics. Which, to me, seems to completely run counter to his > argument that > we could just use "C" for such indexes. I am saying we shouldn't prematurely optimize for the case of ORDER BY on a text PK case by making a an index with a non-"C" collation, given the costs and risks of non-"C" indexes. Particularly because, even if there is an ORDER BY, there are several common reasons such an index would not help anyway. > > > > Regards, Jeff Davis
В списке pgsql-hackers по дате отправления: