Re: PATCH: adaptive ndistinct estimator v4
От | Tomas Vondra |
---|---|
Тема | Re: PATCH: adaptive ndistinct estimator v4 |
Дата | |
Msg-id | 551AEF45.7010700@2ndquadrant.com обсуждение исходный текст |
Ответ на | PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3) (Tomas Vondra <tv@fuzzy.cz>) |
Ответы |
Re: PATCH: adaptive ndistinct estimator v4
Re: PATCH: adaptive ndistinct estimator v4 Re: PATCH: adaptive ndistinct estimator v4 |
Список | pgsql-hackers |
Hi all, attached is v4 of the patch implementing adaptive ndistinct estimator. I've been looking into the strange estimates, mentioned on 2014/12/07: > values current adaptive > ------------------------------ > 106 99 107 > 106 8 6449190 > 1006 38 6449190 > 10006 327 42441 I suspected this might be some sort of rounding error in the numerical optimization (looking for 'm' solving the equation from paper), but turns out that's not the case. The adaptive estimator is a bit unstable for skewed distributions, that are not sufficiently smooth. Whenever f[1] or f[2] was 0 (i.e. there were no values occuring exactly once or twice in the sample), the result was rather off. The simple workaround for this was adding a fallback to GEE when f[1] or f[2] is 0. GEE is another estimator described in the paper, behaving much better in those cases. With the current version, I do get this (with statistics_target=10): values current adaptive ------------------------------ 106 99 108 106 8 178 1006 38 2083 10006 327 11120 The results do change a bit based on the sample, but these values are a good example of the values I'm getting. The other examples (with skewed but smooth distributions) work as good as before. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: