Re: [SQL] SELECT DISTINCT and ORDER BY
От | Tom Lane |
---|---|
Тема | Re: [SQL] SELECT DISTINCT and ORDER BY |
Дата | |
Msg-id | 19160.937405597@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | SELECT DISTINCT and ORDER BY (Joerg Eriskat <Eriskat@icf.med.uni-muenchen.de>) |
Список | pgsql-sql |
Joerg Eriskat <Eriskat@icf.med.uni-muenchen.de> writes: > I'm not sure it is a bug, so I'm asking the SQL gurus. I think > there is a inconsistency with SELECT DISTINCT queries when used in > combination with ORDER BY and column alias names (sorry for the rather > long example): Looks like a bug to me. This seems to be fixed in the current sources, probably as a result of the changes I made to the representation of sort ordering. What is happening in 6.5.* is that the system is failing to notice that the order-by values are the same as the ones being called out as DISTINCT, so it generates additional columns and then sorts on those instead of the DISTINCT ones. 6.6-to-be isn't fooled by the aliases. The underlying problem is still there, though: ordering by a column not mentioned in the DISTINCT list doesn't do the right thing. Arguably there is no right thing, and the code should raise an error, but as of now it definitely isn't doing anything reasonable. It would take two sort steps to handle this: sort by the DISTINCT columns, filter out adjacent duplicates, then sort by the ORDER BY columns. But we only do one sort, and we use the ORDER BY columns (plus any DISTINCT columns not already in ORDER BY). However, there's more to this than meets the eye. If you examine any non-DISTINCT column after the filter step, then you are going to get a randomly chosen one of the values associated with that set of DISTINCT values. So a second sort step on such a column after the DISTINCT sort/filter would not give well-defined results. That's why I think maybe we should raise an error. (I don't like "DISTINCT ON column" either, since it is *necessarily* ill-defined.) regards, tom lane
В списке pgsql-sql по дате отправления: