Re: planner chooses incremental but not the best one

Поиск

Список

Период

Сортировка

От	Andrei Lepikhov
Тема	Re: planner chooses incremental but not the best one
Дата	15 февраля 2024 г. 12:45:36
Msg-id	96501eed-b7a0-4d15-841c-662d81ea5f95@postgrespro.ru обсуждение исходный текст
Ответ на	Re: planner chooses incremental but not the best one (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: planner chooses incremental but not the best one
Список	pgsql-hackers

Дерево обсуждения

On 15/2/2024 18:10, Tomas Vondra wrote:
> 
> 
> On 2/15/24 07:50, Andrei Lepikhov wrote:
>> On 18/12/2023 19:53, Tomas Vondra wrote:
>>> On 12/18/23 11:40, Richard Guo wrote:
>>> The challenge is where to get usable information about correlation
>>> between columns. I only have a couple very rought ideas of what might
>>> try. For example, if we have multi-column ndistinct statistics, we might
>>> look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from
>>>
>>>       ndistinct(b,c,d) / ndistinct(b,c)
>>>
>>> If we know how many distinct values we have for the predicate column, we
>>> could then estimate the number of groups. I mean, we know that for the
>>> restriction "WHERE b = 3" we only have 1 distinct value, so we could
>>> estimate the number of groups as
>>>
>>>       1 * ndistinct(b,c)
>> Did you mean here ndistinct(c,d) and the formula:
>> ndistinct(b,c,d) / ndistinct(c,d) ?
> 
> Yes, I think that's probably a more correct ... Essentially, the idea is
> to estimate the change in number of distinct groups after adding a
> column (or restricting it in some way).
Thanks, I got it. I just think how to implement such techniques with 
extensions just to test the idea in action. In the case of GROUP-BY we 
can use path hook, of course. But what if to invent a hook on clauselist 
estimation?
>> Do you implicitly bear in mind here the necessity of tracking clauses
>> that were applied to the data up to the moment of grouping?
>>
> 
> I don't recall what exactly I considered two months ago when writing the
> message, but I don't see why we would need to track that beyond what we
> already have. Shouldn't it be enough for the grouping to simply inspect
> the conditions on the lower levels?
Yes, exactly. I've thought about looking into baserestrictinfos and, if 
group-by references a subquery targetlist, into subqueries too.

-- 
regards,
Andrei Lepikhov
Postgres Professional

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: planner chooses incremental but not the best one