Implement missing join selectivity estimation for range types
От | Mahmoud Sakr |
---|---|
Тема | Implement missing join selectivity estimation for range types |
Дата | |
Msg-id | CAB4o4asMq3k6HN9WfDsssQ5DDVfAziB4TpiFJ8RBJgZTVuwC7g@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Implement missing join selectivity estimation for range types
|
Список | pgsql-hackers |
Hi, Given a query: SELECT * FROM t1, t2 WHERE t1.r << t2.r where t1.r, t2.r are of range type, currently PostgreSQL will estimate a constant selectivity for the << predicate, which is equal to 0.005, not utilizing the statistics that the optimizer collects for range attributes. We have worked out a theory for inequality join selectivity estimation (http://arxiv.org/abs/2206.07396), and implemented it for range types it in this patch. The algorithm in this patch re-uses the currently collected statistics for range types, which is the bounds histogram. It works fairly accurate for the operations <<, >>, &&, &<, &>, <=, >= with estimation error of about 0.5%. The patch also implements selectivity estimation for the operations @>, <@ (contains and is contained in), but their accuracy is not stable, since the bounds histograms assume independence between the range bounds. A point to discuss is whether or not to keep these last two operations. The patch also includes the selectivity estimation for multirange types, treating a multirange as a single range which is its bounding box. The same algorithm in this patch is applicable to inequality joins of scalar types. We, however, don't implement it for scalars, since more work is needed to make use of the other statistics available for scalars, such as the MCV. This is left as a future work. -- Mahmoud SAKR - Univeristé Libre de Bruxelles This work is done by Diogo Repas, Zhicheng Luo, Maxime Schoemans, and myself
Вложения
В списке pgsql-hackers по дате отправления: