Re: POC, WIP: OR-clause support for indexes

Поиск
Список
Период
Сортировка
От Alena Rybakina
Тема Re: POC, WIP: OR-clause support for indexes
Дата
Msg-id 6b1a967f-5493-470e-96d8-92bbbcad630a@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC, WIP: OR-clause support for indexes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 26.10.2023 22:58, Robert Haas wrote:
> On Thu, Oct 26, 2023 at 3:47 PM Alena Rybakina
> <a.rybakina@postgrespro.ru> wrote:
>> With small amounts of "OR" elements, the cost of orexpr is lower than with "ANY", on the contrary, higher.
> Alexander's example seems to show that it's not that simple. If I'm
> reading his example correctly, with things like aid = 1, the
> transformation usually wins even if the number of things in the OR
> expression is large, but with things like aid + 1 * bid = 1, the
> transformation seems to lose at least with larger numbers of items. So
> it's not JUST the number of OR elements but also what they contain,
> unless I'm misunderstanding his point.
Yes, I agree, with Alexander's example, this option will not help and 
here I need to look inside Expr itself. But I noticed that such a 
complex non-constant expression is always an OpExpr type, otherwise if 
the non-constant part contains only one variable, then it is a Var type. 
We can add a constraint that we will transform expressions with the 
simple variables like x=1 or x=2 or x=3, etc., but expressions like 
x*1+y=1 or x*2+y=2... we ignore.

But then, we do not consider expressions when the nonconstant part is 
always the same for expressions. For example, we could transform x*1+y=1 
or x*1+y=2... to x*1+y = ANY([1,2,...]). But I think it's not so 
critical, because such cases are rare.
>
>> Index Scan using pg_class_oid_index on pg_class  (cost=0.27..2859.42 rows=414 width=68) (actual time=1.504..34.183
rows=260loops=1)
 
>>     Index Cond: (oid = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid, '4'::oid, '5'::oid, '6'::oid, '7'::oid,
>>
>> Bitmap Heap Scan on pg_class  (cost=43835.00..54202.14 rows=414 width=68) (actual time=39.958..41.293 rows=260
loops=1)
>>     Recheck Cond: ((oid = '1'::oid) OR (oid = '2'::oid) OR (oid = '3'::oid) OR (oid = '4'::oid) OR (oid =
>>
>> I think we could see which value is lower, and if lower with expressions converted to ANY, then work with it
further,otherwise work with the original "OR" expressions. But we still need to make this conversion to find out its
cost.
> To me, this sort of output suggests that perhaps the transformation is
> being done in the wrong place. I expect that we have to decide whether
> to convert from OR to = ANY(...) at a very early stage of the planner,
> before we have any idea what the selected path will ultimately be. But
> this output suggests that we want the answer to depend on which kind
> of path is going to be faster, which would seem to argue for doing
> this sort of transformation as part of path generation for only those
> paths that will benefit from it, rather than during earlier phases of
> expression processing/simplification.
>
> I'm not sure I have the full picture here, though, so I might have
> this all wrong.
>
This would be the most ideal option, and to be honest, I like the 
conversion at an early stage also because there are no problems with 
selectivity or link updates if we changed the structure of RestrictInfo 
of relation.

But in terms of calculating which option is better to use transformed or 
original, I think this solution might be complicated, since we need not 
only to highlight the cases in which the transformation wins in 
principle, but also with which types of data it will work best and there 
is a risk of missing some cases and we may need the own evaluation 
model. Now it's hard for me to come up with something simple.

The cost option seems simpler and clearer to me, but yes, it is 
difficult to decide when it is better to do the conversion for the most 
correct estimate.


-- 
Regards,
Alena Rybakina




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Atomic ops for unlogged LSN
Следующее
От: Tom Lane
Дата:
Сообщение: Recovering from detoast-related catcache invalidations