Re: PoC/WIP: Extended statistics on expressions
От | Dean Rasheed |
---|---|
Тема | Re: PoC/WIP: Extended statistics on expressions |
Дата | |
Msg-id | CAEZATCWF66BWJq-OwLuP5LGK6W9LDYxCQwLxqB36qmq3b1Ch8Q@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: PoC/WIP: Extended statistics on expressions (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: PoC/WIP: Extended statistics on expressions
|
Список | pgsql-hackers |
On Wed, 17 Mar 2021 at 17:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > My concern is that the current behavior (where we prefer expression > stats over multi-column stats to some extent) works fine as long as the > parts are independent, but once there's dependency it's probably more > likely to produce underestimates. I think underestimates for grouping > estimates were a risk in the past, so let's not make that worse. > I'm not sure the current behaviour really is preferring expression stats over multi-column stats. In this example, where we're grouping by (a+b), (c+d) and have stats on [(a+b),c] and (c+d), neither of those multi-column stats actually match more than one column/expression. If anything, I'd go the other way and say that it was wrong to use the [(a+b),c] stats in the first case, where they were the only stats available, since those stats aren't really applicable to (c+d), which probably ought to be treated as independent. IOW, it might have been better to estimate the first case as ndistinct((a+b)) * ndistinct(c) * ndistinct(d) and the second case as ndistinct((a+b)) * ndistinct((c+d)) Regards, Dean
В списке pgsql-hackers по дате отправления: