Re: Selectivity estimation for inet operators
От | Emre Hasegeli |
---|---|
Тема | Re: Selectivity estimation for inet operators |
Дата | |
Msg-id | 20140831175918.GB8990@hasegeli-2.local обсуждение исходный текст |
Ответ на | Re: Selectivity estimation for inet operators (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
> Heikki Linnakangas <hlinnakangas@vmware.com> writes: > > * inet_mcv_join_selec() is O(n^2) where n is the number of entries in > > the MCV lists. With the max statistics target of 10000, a worst case > > query on my laptop took about 15 seconds to plan. Maybe that's > > acceptable, but you went through some trouble to make planning of MCV vs > > histogram faster, by the log2 method to compare only some values, so I > > wonder why you didn't do the same for the MCV vs MCV case? > > Actually, what I think needs to be asked is the opposite question: why is > the other code ignoring some of the statistical data? If the user asked > us to collect a lot of stats detail it seems reasonable that he's > expecting us to use it to get more accurate estimates. It's for sure > not obvious why these estimators should take shortcuts that are not being > taken in the much-longer-established code for scalar comparison estimates. It will still use more statistical data, when statistics_target is higher. It was not sure that the user wants to spent O(n^2) amount of time based on statistics_target. Attached version is without this optimization. Estimates are better without it, but planning takes more time. > I'm not exactly convinced that the math adds up in this logic, either. > The way in which it combines results from looking at the MCV lists and > at the histograms seems pretty arbitrary. I taught the product of the join will be (left_mcv + left_histogram) * (right_mcv + right_histogram) * selectivity and tried to calculate it as in the following: (left_mcv * right_mcv * selectivity) + (right_mcv * left_histogram * selectivity) + (left_mcv * right_histogram * selectivity) + (left_histogram * right_histogram * selectivity) where left_histogram is 1.0 - left_nullfrac - left_mcv I fixed calculation for the MCV vs histogram part. The estimates of inner join are very close to the actual rows with statistics_target = 1000. I think the calculation should be right.
Вложения
В списке pgsql-hackers по дате отправления: