Re: multivariate statistics (v25)
От | Tomas Vondra |
---|---|
Тема | Re: multivariate statistics (v25) |
Дата | |
Msg-id | a80cbb70-ea48-0367-9a40-a5cb6484046e@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] multivariate statistics (v25) (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Список | pgsql-hackers |
On 04/05/2017 08:41 AM, Sven R. Kunze wrote: > Thanks Tomas and David for hacking on this patch. > > On 04.04.2017 20:19, Tomas Vondra wrote: >> I'm not sure we still need the min_group_size, when evaluating >> dependencies. It was meant to deal with 'noisy' data, but I think it >> after switching to the 'degree' it might actually be a bad idea. >> >> Consider this: >> >> create table t (a int, b int); >> insert into t select 1, 1 from generate_series(1, 10000) s(i); >> insert into t select i, i from generate_series(2, 20000) s(i); >> create statistics s with (dependencies) on (a,b) from t; >> analyze t; >> >> select stadependencies from pg_statistic_ext ; >> stadependencies >> -------------------------------------------- >> [{1 => 2 : 0.333344}, {2 => 1 : 0.333344}] >> (1 row) >> >> So the degree of the dependency is just ~0.333 although it's obviously >> a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The >> reason is that we discard 2/3 of rows, because those groups are only a >> single row each, except for the one large group (1/3 of rows). > > Just for me to follow the comments better. Is "dependency" roughly the > same as when statisticians speak about " conditional probability"? > No, it's more 'functional dependency' from relational normal forms. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: