Re: Use of additional index columns in rows filtering

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Use of additional index columns in rows filtering
Дата	6 августа 2023 г. 17:23:19
Msg-id	f214a384-5a50-b0cd-1e42-5f917f2c572c@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Use of additional index columns in rows filtering (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: Use of additional index columns in rows filtering
Список	pgsql-hackers

Дерево обсуждения

On 8/5/23 02:53, Peter Geoghegan wrote:
> ...
> 
>> Right. This however begs a question - why would we actually need to
>> check the visibility map before evaluating the index filter, when the
>> index tuple alone is clearly good enough for the bitmapOr plan?
>>
>> Because if we didn't need to do that VM check, this issue with extra
>> heap accesses would disappear.
> 
> The index AM is entitled to make certain assumptions of opclass
> members -- assumptions that cannot be made during expression
> evaluation. The classic example is division-by-zero during evaluation
> of a qual, for a tuple that wasn't visible anyway. Our assumption is
> that stuff like that just cannot happen with index quals -- users
> shouldn't ever encounter sanity check errors caused by
> invisible-to-their-MVCC-snapshot tuples.
> 

Thanks for reminding me, I keep forgetting about this.

> I think that that's the main difficulty, as far as avoiding heap
> access for index filters is concerned. Of course, you could just limit
> yourself to those cases where the index AM assumptions were safe. But
> at that point, why not simply make sure to generate true index quals,
> and be done with it?
> 

Yeah, if it's possible to generate true index quals, it'd be stupid not
to do that. But clearly there are cases where that's not possible (we
may not have the code doing that, or maybe it's just not possible in
principle).

Would it be possible to inspect the expression and determine if it's
"safe" to be treated almost like an index qual? Say, given a complex
expression, we might check if it consists only of expressions that could
be treated as an index qual. So a bit like leak-proof, which may
actually be relevant here too I guess (e.g. int4div is not leak-proof,
for example, so e.g. the division-by-zero would not allow index-qual
treatment).

I now recall there probably was a past discussion about how leak-proof
relates to this, but IIRC the conclusion was it's not quite the same
thing. But maybe I just remember things wrong.

Anyway, I think there are always going to be clauses that would be safe
to evaluate on the index, but the index AM does not know to handle them
for some reason. For example it might require extending the AM to handle
generic expressions, which doesn't seem quite desirable.

So I think I see three points where we could evaluate expressions:

1) in the AM, as access preditates / index quals (doing this more often
   is kinda what the SAOP patches aim to do)

2) in the index scan, before checking VM / visibility (if the qual is
   safe to be evaluated early)

3) in the index scan, after checking VM / visibility (if the expression
   does unsafe things)



>> I copied this from the IOS somewhat blindly, but now I'm starting to
>> think it was misguided. I thought it's a protection against processing
>> "invalid" tuples - not tuples broken after a crash (as that would be
>> fixed by crash recovery), but e.g. tuples with schema changes from an
>> aborted transaction.
> 
> I agree that schema changes for indexes shouldn't be an issue, though.
> 
>> I'm not quite sure what are the differences between "index qual" vs.
>> "access predicate index qual" vs. "index filter predicate index quals",
>> or what "dispacing" would mean exactly.
> 
> For the purposes of this point about "a hierarchy of quals", it
> doesn't really matter -- that was the point I was trying to make.
> 
> In other words, "index quals" are strictly better than equivalent
> "index filters", which are themselves strictly better than equivalent
> "table filters". While it is also true that you can meaningfully
> classify "index quals" into their own hierarchy (namely access
> predicates vs index filter predicates), that doesn't necessarily need
> to be discussed when discussing the hierarchy from a planner point of
> view, since it is (at least for now) internal to the nbtree index AM.
> 
> On second thought, I tend to doubt that your patch needs to worry
> about each type of index qual directly. It probably needs to worry
> about index quals in general.
> 

I agree. That seems like a discussion relevant to the general topic of
"upgrading" clauses. If anything, the patch may need to worry about
moving table filters to index filters, that's the only thing it does.

> It is always better to make what could be an "index filter" into an
> index qual. Of course, the current practical problem for you is
> figuring out how to deal with that in cases like the BitmapOr case.
> Since it's not as if the planner is wrong, exactly...it really is the
> cheapest plan, so the planner is at least right on its own terms. I am
> interested in finding a solution to that problem.
> 

Well, if the planner is not wrong, what solution are we looking for? ;-)

FWIW if the problem is the patch may make the new plan look cheaper than
some "actually better" plan (e.g. the BitmapOr one). In that case, we
could just keep the old costing (kinda assuming the worst case, but as
you said, the benefits are limited, while the risks are arbitrary).
That's the only idea I have.

>> FWIW this also reminds me that this whole discussion mostly focused on
>> SAOP clauses (and how they may be translated into access predicates
>> etc.). My patch is however way more general - it applies to all clauses,
>> not just SAOP ones, including clauses with no chance of evaluating at
>> the AM level.
> 
> I agree that your patch is way more general, in one way. But the SAOP
> patch is also much more general than you might think.
> 
> For one thing the whole BitmapOr plan issue actually required that you
> compared your patch to a combination of my SAOP patch and Alena
> Rybakina's OR-to-SAOP transformation patch -- you needed both patches.
> Her patch effectively made my own patch much more general. But there
> are all kinds of other transformations that might further extend the
> applicability of nbtree executor changes from my patch -- the MDAM
> techniques are certainly not limited to ORs/SAOPs. For example, it's
> easy to imagine inventing a new type of SAOP-style clause that
> represented "BETWEEN x AND y" expressions, that would make range
> predicates into "first class indexable operators" --
> ScalarRangeOprExpr, or something. With multi-column indexes, these
> ScalarRangeOprExpr clauses could be composed beside ScalarArrayOpExpr
> clauses, as well as simpler clauses -- all while preserving index
> order on output. So there are quite a few plan shapes that aren't
> possible at all right now, that become possible, many of which don't
> even have SAOPs/ORs.
> 
> Of course it won't ever be possible to create a transformation that
> doesn't ultimately flatten everything into MDAM style "single value"
> DNF predicates, which have to use simple B-Tree opclass operators --
> obviously there are fundamental limits to it. So even in a perfect
> world, with every possible MDAM-ish transformation implemented, we'll
> still have a significant need for your patch.
> 

Yup, that's exactly my point.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Use of additional index columns in rows filtering