Обсуждение: Third thoughts about the DISTINCT MAX() problem
I just realized that the patch I applied here http://archives.postgresql.org/pgsql-committers/2008-03/msg00531.php for Taiki Yamaguchi's bug report here http://archives.postgresql.org/pgsql-bugs/2008-03/msg00275.php really doesn't work. It assumes that an ungrouped aggregate query can't return more than one row, which is true in straight SQL ... but it's not true if you consider SRFs in the target list. CVS HEAD gives the wrong answer for this example in the regression database: regression=# select max(unique1), generate_series(1,3) as g from tenk1 order by g desc;max | g ------+---9999 | 19999 | 29999 | 3 (3 rows) because it wrongly supposes it can discard the ORDER BY. So, back to the drawing board. I had thought of two approaches to fixing the problem instead of just dodging it. Plan A was to apply planagg.c's Aggref->Param substitution inside EquivalenceClasses, as in the draft patch here: http://archives.postgresql.org/pgsql-patches/2008-03/msg00388.php which I didn't entirely like for reasons mentioned in that post. Plan B was to try to revert to the way sort clause matching was done pre-8.3, that is have make_sort_from_pathkeys check first for a matching ressortgroupref tag before it goes looking for equal() expressions. I had actually tried to do that first but got hung up on the problem of identifying the correct sort operator --- just taking the exposed type of the targetlist entry doesn't always work, because of binary-compatible cases (eg, tlist entry may say it yields varchar but we need to find the text opclass). Perhaps thinking a bit harder would yield a solution though. Does anyone have comments for or against either of these approaches, or perhaps a Plan C to consider? regards, tom lane
I wrote: > Plan B was to try to revert to the way sort clause matching was > done pre-8.3, that is have make_sort_from_pathkeys check first > for a matching ressortgroupref tag before it goes looking for equal() > expressions. I had actually tried to do that first but got hung > up on the problem of identifying the correct sort operator --- > just taking the exposed type of the targetlist entry doesn't always > work, because of binary-compatible cases (eg, tlist entry may say > it yields varchar but we need to find the text opclass). Perhaps > thinking a bit harder would yield a solution though. Ah, got it. I had previously attached a sortref to EquivalenceClasses (ec_sortref) to deal properly with multiple textually-identical but volatile expressions. But that's really the wrong thing: sortrefs should be attached to individual EquivalenceMembers, instead. If we do that then the existing logic in make_sort_from_pathkeys can be made to use the sortref as a preferred (and faster) way of identifying which tlist member matches a pathkey. This implies a change in the EquivalenceClass data structures, but those are never dumped to disk so it's not a problem to back-patch. Note: we still need to be able to match on equal() since we may have to deal with sort keys that are not any of the explicit ORDER BY expressions and hence don't have a sortref. But that's not a problem for the MIN/MAX optimization since the only things left to do after planagg.c are apply explicit ORDER BY or DISTINCT operations. regards, tom lane
On Sat, Mar 29, 2008 at 3:21 AM, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /><divclass="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt0pt 0.8ex; padding-left: 1ex;"> I just realized that the patch I applied here<br /><a href="http://archives.postgresql.org/pgsql-committers/2008-03/msg00531.php" target="_blank">http://archives.postgresql.org/pgsql-committers/2008-03/msg00531.php</a><br/> for Taiki Yamaguchi's bug reporthere<br /><a href="http://archives.postgresql.org/pgsql-bugs/2008-03/msg00275.php" target="_blank">http://archives.postgresql.org/pgsql-bugs/2008-03/msg00275.php</a><br/> really doesn't work. It assumesthat an ungrouped aggregate<br /> query can't return more than one row, which is true in straight<br /> SQL ... butit's not true if you consider SRFs in the target list.<br /> CVS HEAD gives the wrong answer for this example in the regression<br/> database:<br /><br /> regression=# select max(unique1), generate_series(1,3) as g from tenk1 order by g desc;<br/> max | g<br /> ------+---<br /> 9999 | 1<br /> 9999 | 2<br /> 9999 | 3<br /> (3 rows)<br /><br /> becauseit wrongly supposes it can discard the ORDER BY.<br /><br /> So, back to the drawing board. I had thought of twoapproaches to<br /> fixing the problem instead of just dodging it. Plan A was to<br /> apply planagg.c's Aggref->Paramsubstitution inside EquivalenceClasses,<br /> as in the draft patch here:<br /><a href="http://archives.postgresql.org/pgsql-patches/2008-03/msg00388.php" target="_blank">http://archives.postgresql.org/pgsql-patches/2008-03/msg00388.php</a><br/> which I didn't entirely like forreasons mentioned in that post.<br /> Plan B was to try to revert to the way sort clause matching was<br /> done pre-8.3,that is have make_sort_from_pathkeys check first<br /> for a matching ressortgroupref tag before it goes lookingfor equal()<br /> expressions. I had actually tried to do that first but got hung<br /> up on the problem of identifyingthe correct sort operator ---<br /> just taking the exposed type of the targetlist entry doesn't always<br />work, because of binary-compatible cases (eg, tlist entry may say<br /> it yields varchar but we need to find the textopclass). Perhaps<br /> thinking a bit harder would yield a solution though.<br /><br /> Does anyone have comments foror against either of these approaches,<br /> or perhaps a Plan C to consider?<br /><br /> <br /></blockquote></div><br/>In the past I had seen suggestions (perhaps from you) that we should disallow SRFs in target list...Although not for 8.3, but would this be a good time for 8.4 to deprecate the usage of SRFs in targetlist?<br /><br/>Best regards,<br />-- <br />gurjeet[.singh]@EnterpriseDB.com<br />singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo}.com<br /><br />EnterpriseDB <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a><br /><br />Mailsent from my BlackLaptop device
On Sat, 2008-03-29 at 07:07 +0530, Gurjeet Singh wrote: > On Sat, Mar 29, 2008 at 3:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > regression=# select max(unique1), generate_series(1,3) as g > from tenk1 order by g desc; > max | g > ------+--- > 9999 | 1 > 9999 | 2 > 9999 | 3 > (3 rows) > > > In the past I had seen suggestions (perhaps from you) that we should > disallow SRFs in target list... Although not for 8.3, but would this > be a good time for 8.4 to deprecate the usage of SRFs in targetlist? I think it makes sense to throw an error if we have both ungrouped aggregates and SRFs in the targetlist. The two concepts seem at odds with each other anyhow. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
I wrote: > Plan B was to try to revert to the way sort clause matching was > done pre-8.3, that is have make_sort_from_pathkeys check first > for a matching ressortgroupref tag before it goes looking for equal() > expressions. So I tried that, and after a whole bunch of regression test failures I realize it won't work. ressortgroupref tags are only guaranteed unique within a given targetlist --- the values applicable in the final result tlist might mean something else within a plan node further down in the tree. Since EquivalenceClasses are global to the whole plan, trying to match them to tlist entries by ressortgroupref isn't safe. This actually calls into question the existing fix for making sure we pick the right tlist entry when dealing with volatile ORDER BY expressions. I think it's all right because sorting by such expressions could only occur at top tree level --- we'd never consider such an expression as a mergejoin key, for instance. But it seems fragile and something we'll need to keep an eye on. Anyway, it seems that the only remaining acceptable fix is the one involving modifying the EquivalenceClasses when planagg.c makes its substitutions. Unless someone has another idea. regards, tom lane