Re: Multidimensional Histograms

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Multidimensional Histograms
Дата	7 января 2024 г. 23:29:01
Msg-id	61d1e6b0-914e-4aa3-8a32-ff2cf9e91100@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Multidimensional Histograms (Alexander Cheshev <alex.cheshev@gmail.com>)
Ответы	Re: Multidimensional Histograms
Список	pgsql-hackers

Дерево обсуждения


On 1/7/24 23:53, Alexander Cheshev wrote:
> Hi Tomas,
>
>> The thing I was proposing is that it should be possible to build
>> histograms with bins adapted to density in the given region. With
>> smaller buckets in areas with high density. So that queries intersect
>> with fewer buckets in low-density parts of the histogram.
>
> This is how Equi-Depth Histogram works. Postgres has maller buckets in
> areas with high density:
>
> values[(i * (nvals - 1)) / (num_hist - 1)]
>
True, but the boundaries are somewhat random. Also, I was referring to
the multi-dimensional case, it wasn't clear to me if the proposal is to
do the same thing.

>> I don't recall the details of the MHIST-2 scheme, but it's true
>> calculating "perfect" V-optimal histogram would be quadratic O(N^2*B).
>
> In M-dimensional space "perfect" V-Optimal Histogram is an NP-hard
> problem. In other words it is not possible to build it in polynomial
> time. How did you come up with the estimate?!
>
See section 3.2 in this paper (the "view PDF" does not work for me, but
the "source PDF" does download a postscript):

https://citeseerx.ist.psu.edu/doc_view/pid/35e29cbc2bfe6662653bdae1fb89c091e2ece560

>> But that's exactly why greedy/approximate algorithms exist. Yes, it's
>> not going to produce the optimal V-optimal histogram, but so what?
>
> Greedy/approximate algorithm has time complexity O(M*N*B), where M
> equals the number of dimensions. MHIST-2 is a greedy/approximate
> algorithm.
>
>> And how does this compare to the approximate/greedy algorithms, both in
>> terms of construction time and accuracy?
>
> Time complexity of Equi-Depth Histogram has no dependence on B.
>
Really? I'd expect that to build B buckets, the algorithm repeat some
O(M*N) action B-times, roughly. I mean, it needs to pick a dimension by
which to split, then do some calculation on the N tuples, etc.

Maybe I'm imagining that wrong, though. It's been ages since I looked ad
big-O complexity and/or the histograms. I'd have to play with those
algorithms for a bit again.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Multidimensional Histograms