Re: Merging statistics from children instead of re-sampling everything

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Merging statistics from children instead of re-sampling everything
Дата	10 февраля 2022 г. 22:37:26
Msg-id	4e86ae74-4e2c-b40f-4405-035d2f818e5d@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Merging statistics from children instead of re-sampling everything (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы	Re: Merging statistics from children instead of re-sampling everything
Список	pgsql-hackers

Дерево обсуждения

On 2/10/22 12:50, Andrey Lepikhov wrote:
> On 21/1/2022 01:25, Tomas Vondra wrote:
>> But I don't have a very good idea what to do about statistics that we
>> can't really merge. For some types of statistics it's rather tricky to
>> reasonably merge the results - ndistinct is a simple example, although
>> we could work around that by building and merging hyperloglog counters.
>
> I think, as a first step on this way we can reduce a number of pulled
> tuples. We don't really needed to pull all tuples from a remote server.
> To construct a reservoir, we can pull only a tuple sample. Reservoir
> method needs only a few arguments to return a sample like you read
> tuples locally. Also, to get such parts of samples asynchronously, we
> can get size of each partition on a preliminary step of analysis.
> In my opinion, even this solution can reduce heaviness of a problem
> drastically.
> 

Oh, wow! I haven't realized we're fetching all the rows from foreign
(postgres_fdw) partitions. For local partitions we already do that,
because that uses the usual acquire function, with a reservoir
proportional to partition size. I have assumed we use tablesample to
fetch just a small fraction of rows from FDW partitions, and I agree
doing that would be a pretty huge benefit.

I actually tried hacking that together - there's a couple problems with
that (e.g. determining what fraction to sample using bernoulli/system),
but in principle it seems quite doable. Some minor changes to the FDW
API may be necessary, not sure.

Not sure about the async execution - that seems way more complicated,
and the sampling reduces the total cost, async just parallelizes it.

That being said, this thread was not really about foreign partitions,
but about re-analyzing inheritance trees in general. And sampling
foreign partitions doesn't really solve that - we'll still do the
sampling over and over.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Merging statistics from children instead of re-sampling everything