Re: PoC/WIP: Extended statistics on expressions
От | Tomas Vondra |
---|---|
Тема | Re: PoC/WIP: Extended statistics on expressions |
Дата | |
Msg-id | f4dac079-6bc4-ccc0-0fd8-3e2e2da28d92@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: PoC/WIP: Extended statistics on expressions (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Ответы |
Re: PoC/WIP: Extended statistics on expressions
|
Список | pgsql-hackers |
On 3/17/21 7:54 PM, Dean Rasheed wrote: > On Wed, 17 Mar 2021 at 17:26, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> My concern is that the current behavior (where we prefer expression >> stats over multi-column stats to some extent) works fine as long as the >> parts are independent, but once there's dependency it's probably more >> likely to produce underestimates. I think underestimates for grouping >> estimates were a risk in the past, so let's not make that worse. >> > > I'm not sure the current behaviour really is preferring expression > stats over multi-column stats. In this example, where we're grouping > by (a+b), (c+d) and have stats on [(a+b),c] and (c+d), neither of > those multi-column stats actually match more than one > column/expression. If anything, I'd go the other way and say that it > was wrong to use the [(a+b),c] stats in the first case, where they > were the only stats available, since those stats aren't really > applicable to (c+d), which probably ought to be treated as > independent. IOW, it might have been better to estimate the first case > as > > ndistinct((a+b)) * ndistinct(c) * ndistinct(d) > > and the second case as > > ndistinct((a+b)) * ndistinct((c+d)) > OK. I might be confused, but isn't that what the algorithm currently does? Or am I just confused about what the first/second case refers to? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: