Re: planner chooses incremental but not the best one
От | Tomas Vondra |
---|---|
Тема | Re: planner chooses incremental but not the best one |
Дата | |
Msg-id | b3e92c2a-7fb0-fae2-a4aa-5541ae19fd38@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: planner chooses incremental but not the best one (Richard Guo <guofenglinux@gmail.com>) |
Ответы |
Re: planner chooses incremental but not the best one
(ywgrit <yw987194828@gmail.com>)
Re: planner chooses incremental but not the best one (Andrei Lepikhov <a.lepikhov@postgrespro.ru>) |
Список | pgsql-hackers |
On 12/18/23 11:40, Richard Guo wrote: > > On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra > <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>> > wrote: > > Oh! Now I see what you meant by using the new formula in 84f9a35e3 > depending on how we sum tuples. I agree that seems like the right thing. > > I'm not sure it'll actually help with the issue, though - if I apply the > patch, the plan does not actually change (and the cost changes just a > little bit). > > I looked at this only very briefly, but I believe it's due to the > assumption of independence I mentioned earlier - we end up using the new > formula introduced in 84f9a35e3, but it assumes it assumes the > selectivity and number of groups are independent. But that'd not the > case here, because the groups are very clearly correlated (with the > condition on "b"). > > > You're right. The patch allows us to adjust the estimate of distinct > values for appendrels using the new formula introduced in 84f9a35e3. > But if the restrictions are correlated with the grouping expressions, > the new formula does not behave well. That's why the patch does not > help in case [1], where 'b' and 'c' are correlated. > > OTOH, if the restrictions are not correlated with the grouping > expressions, the new formula would perform quite well. And in this case > the patch would help a lot, as shown in [2] where estimate_num_groups() > gives a much more accurate estimate with the help of this patch. > > So this patch could be useful in certain situations. I'm wondering if > we should at least have this patch (if it is right). > I do agree the patch seems to do the right thing, and it's worth pushing on it's own. > > If that's the case, I'm not sure how to fix this :-( > > > The commit message of 84f9a35e3 says > > This could possibly be improved upon in the future by identifying > correlated restrictions and using a hybrid of the old and new > formulae. > > Maybe this is something we can consider trying. But anyhow this is not > an easy task I suppose. Yeah, if it was easy, it'd have been done in 84f9a35e3 already ;-) The challenge is where to get usable information about correlation between columns. I only have a couple very rought ideas of what might try. For example, if we have multi-column ndistinct statistics, we might look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from ndistinct(b,c,d) / ndistinct(b,c) If we know how many distinct values we have for the predicate column, we could then estimate the number of groups. I mean, we know that for the restriction "WHERE b = 3" we only have 1 distinct value, so we could estimate the number of groups as 1 * ndistinct(b,c) I'm well aware this is only a very trivial example, and for more complex examples it's likely way more complicated. But hopefully it illustrates the general idea. The other idea would be to maybe look at multi-column MCV, and try using it to deduce cross-column correlation too (it could be more accurate for arbitrary predicates). And finally, we might try being a bit more pessimistic and look at what the "worst case" behavior could be. So for example instead of trying to estimate the real number of groups, we'd ask "What's the minimum number of groups we're likely to get?". And use that to cost the incremental sort. But I don't think we do this elsewhere, and I'm not sure we want to start with this risk-based approach here. It might be OK, because we usually expect the incremental sort to be much cheaper, ... If this something would be interested in exploring? I don't have capacity to work on this myself, but I'd be available for reviews, feedback and so on. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: