Re: Use zero for nullness estimates of system attributes
От | Jim Finnerty |
---|---|
Тема | Re: Use zero for nullness estimates of system attributes |
Дата | |
Msg-id | 1548516385152-0.post@n3.nabble.com обсуждение исходный текст |
Ответ на | Re: Use zero for nullness estimates of system attributes (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Use zero for nullness estimates of system attributes
|
Список | pgsql-hackers |
It's related, but what I was referring to applies even to the uncorrelated case: suppose you have something like: select x, sum(z) from t where x > 5 and y in ('a', 'b', 'c') group by x; let's say that 'a', 'b', and 'c' are not frequent values of y, so the estimated selectivity is based on the n_distinct of y and the 3 values. Now imagine that x > 5 is applied first. That reduces the number of qualifying rows by the selectivity of (x > 5), but it may also reduce the number of distinct values of y. If it reduces the n_distinct of y, then the IN predicate selectivity should be adjusted also. The negative n_distinct representation (for n_distinct a large fraction of numrows) already accounts for an automatic scaling of n_distinct when the table size grows or shrinks. Adjusting the n_distinct of y after applying predicate (x > 5) is conceptually the same thing, except that the scaling is not linear, but would be based on a (fairly cheap) probabilistic formula of how many distinct y's can be expected to survive after applying predicate x, rather than assuming that all of them do. If you have correlation information for columns (x, y) you'd want to use that information instead, but in the absence of that information you could make this adjustment. It might make a good intern project if somebody wants to investigate it. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
В списке pgsql-hackers по дате отправления: