Обсуждение: Optimization outcome depends on the index order

Поиск
Список
Период
Сортировка

Optimization outcome depends on the index order

От
Andrei Lepikhov
Дата:
On 21/12/2023 12:10, Alexander Korotkov wrote:
 > I took a closer look at the patch in [9].  I should drop my argument
 > about breaking the model, because add_path() already considers other
 > aspects than just costs.  But I have two more note about that patch:
 >
 > 1) It seems that you're determining the fact that the index path
 > should return strictly one row by checking path->rows <= 1.0 and
 > indexinfo->unique.  Is it really guaranteed that in this case quals
 > are matching unique constraint?  path->rows <= 1.0 could be just an
 > estimation error.  Or one row could be correctly estimated, but it's
 > going to be selected by some quals matching unique constraint and
 > other quals in recheck.  So, it seems there is a risk to select
 > suboptimal index due to this condition.

Operating inside the optimizer, we consider all estimations to be the 
sooth. This patch modifies only one place: having two equal assumptions, 
we just choose one that generally looks more stable.
Filtered tuples should be calculated and included in the cost of the 
path. The decision on the equality of paths has been made in view of the 
estimation of these filtered tuples.

 > 2) Even for non-unique indexes this patch is putting new logic on top
 > of the subsequent code.  How we can prove it's going to be a win?
 > That could lead, for instance, to dropping parallel-safe paths in
 > cases we didn't do so before.
Because we must trust all predictions made by the planner, we just 
choose the most trustworthy path. According to the planner logic, it is 
a path with a smaller selectivity. We can make mistakes anyway just 
because of the nature of estimation.

 > Anyway, please start a separate thread if you're willing to put more
 > work into this.

Done

 > 9. https://www.postgresql.org/message-id/154f786a-06a0-4fb1-
 > b8a4-16c66149731b%40postgrespro.ru

-- 
regards,
Andrei Lepikhov
Postgres Professional
Вложения

Re: Optimization outcome depends on the index order

От
Alexander Korotkov
Дата:
On Fri, Dec 22, 2023 at 8:53 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
> On 21/12/2023 12:10, Alexander Korotkov wrote:
>  > I took a closer look at the patch in [9].  I should drop my argument
>  > about breaking the model, because add_path() already considers other
>  > aspects than just costs.  But I have two more note about that patch:
>  >
>  > 1) It seems that you're determining the fact that the index path
>  > should return strictly one row by checking path->rows <= 1.0 and
>  > indexinfo->unique.  Is it really guaranteed that in this case quals
>  > are matching unique constraint?  path->rows <= 1.0 could be just an
>  > estimation error.  Or one row could be correctly estimated, but it's
>  > going to be selected by some quals matching unique constraint and
>  > other quals in recheck.  So, it seems there is a risk to select
>  > suboptimal index due to this condition.
>
> Operating inside the optimizer, we consider all estimations to be the
> sooth. This patch modifies only one place: having two equal assumptions,
> we just choose one that generally looks more stable.
> Filtered tuples should be calculated and included in the cost of the
> path. The decision on the equality of paths has been made in view of the
> estimation of these filtered tuples.

Even if estimates are accurate the conditions in the patch doesn't guarantee there is actually a unique condition.

# create table t as select i/1000 a, i % 1000 b, i % 1000 c from generate_series(1,1000000) i;
# create unique index t_unique_idx on t(a,b);
# create index t_another_idx on t(a,c);
# \d t
                 Table "public.t"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 a      | integer |           |          |
 b      | integer |           |          |
 c      | integer |           |          |
Indexes:
    "t_another_idx" btree (a, c)
    "t_unique_idx" UNIQUE, btree (a, b)
# set enable_bitmapscan = off; explain select * from t where a = 1 and c = 1;
SET
Time: 0.459 ms
                                QUERY PLAN
--------------------------------------------------------------------------
 Index Scan using t_unique_idx on t  (cost=0.42..1635.16 rows=1 width=12)
   Index Cond: (a = 1)
   Filter: (c = 1)
(3 rows)

 
>  > 2) Even for non-unique indexes this patch is putting new logic on top
>  > of the subsequent code.  How we can prove it's going to be a win?
>  > That could lead, for instance, to dropping parallel-safe paths in
>  > cases we didn't do so before.
> Because we must trust all predictions made by the planner, we just
> choose the most trustworthy path. According to the planner logic, it is
> a path with a smaller selectivity. We can make mistakes anyway just
> because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not clear why this should be on top of other logic later in add_path().

>  > Anyway, please start a separate thread if you're willing to put more
>  > work into this.
>
> Done

Thanks.

------
Regards,
Alexander Korotkov

Re: Optimization outcome depends on the index order

От
Andrei Lepikhov
Дата:
On 22/12/2023 11:48, Alexander Korotkov wrote:
>  > Because we must trust all predictions made by the planner, we just
>  > choose the most trustworthy path. According to the planner logic, it is
>  > a path with a smaller selectivity. We can make mistakes anyway just
>  > because of the nature of estimation.
> 
> Even if we need to take selectivity into account here, it's still not 
> clear why this should be on top of other logic later in add_path().
I got your point now, thanks for pointing it out. In the next version of 
the patch selectivity is used as a criteria only in the case of COSTS_EQUAL.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: Optimization outcome depends on the index order

От
Alexander Korotkov
Дата:
On Fri, Dec 22, 2023 at 7:24 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 22/12/2023 11:48, Alexander Korotkov wrote:
> >  > Because we must trust all predictions made by the planner, we just
> >  > choose the most trustworthy path. According to the planner logic, it is
> >  > a path with a smaller selectivity. We can make mistakes anyway just
> >  > because of the nature of estimation.
> >
> > Even if we need to take selectivity into account here, it's still not
> > clear why this should be on top of other logic later in add_path().
> I got your point now, thanks for pointing it out. In the next version of
> the patch selectivity is used as a criteria only in the case of COSTS_EQUAL.

It looks better now.  But it's hard for me to judge these heuristics
in add_path().  Tom, what do you think about this?

------
Regards,
Alexander Korotkov



Re: Optimization outcome depends on the index order

От
Andrei Lepikhov
Дата:
On 25/12/2023 18:36, Alexander Korotkov wrote:
> On Fri, Dec 22, 2023 at 7:24 PM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru> wrote:
>> On 22/12/2023 11:48, Alexander Korotkov wrote:
>>>   > Because we must trust all predictions made by the planner, we just
>>>   > choose the most trustworthy path. According to the planner logic, it is
>>>   > a path with a smaller selectivity. We can make mistakes anyway just
>>>   > because of the nature of estimation.
>>>
>>> Even if we need to take selectivity into account here, it's still not
>>> clear why this should be on top of other logic later in add_path().
>> I got your point now, thanks for pointing it out. In the next version of
>> the patch selectivity is used as a criteria only in the case of COSTS_EQUAL.
> 
> It looks better now.  But it's hard for me to judge these heuristics
> in add_path().  Tom, what do you think about this?
Just food for thought:
Another approach I have considered was to initialize the relation index 
list according to some consistent rule: place unique indexes at the head 
of the list, arrange indexes according to the number of columns involved 
and sort in some lexical order.
But IMO, the implemented approach looks better.

-- 
regards,
Andrei Lepikhov
Postgres Professional