Re: Use of additional index columns in rows filtering

Поиск
Список
Период
Сортировка
От	Peter Geoghegan
Тема	Re: Use of additional index columns in rows filtering
Дата	6 августа 2023 г. 20:13:08
Msg-id	CAH2-Wz=4dqeYK3kpfyEJbrXtWD2SDf3f+HSn6_Wx97ymRhG1MQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Use of additional index columns in rows filtering (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: Use of additional index columns in rows filtering
Список	pgsql-hackers
Дерево обсуждения
On Sun, Aug 6, 2023 at 10:23 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> > The index AM is entitled to make certain assumptions of opclass
> > members -- assumptions that cannot be made during expression
> > evaluation.

> Thanks for reminding me, I keep forgetting about this.

I was almost certain that you already knew that, actually. It's easy
to forget such details in a discussion like this one, where the focus
zooms out and then zooms back in, again and again.

> Would it be possible to inspect the expression and determine if it's
> "safe" to be treated almost like an index qual? Say, given a complex
> expression, we might check if it consists only of expressions that could
> be treated as an index qual. So a bit like leak-proof, which may
> actually be relevant here too I guess (e.g. int4div is not leak-proof,
> for example, so e.g. the division-by-zero would not allow index-qual
> treatment).

Clearly you're talking about a distinct set of guarantees to the ones
that B-Tree opclasses make about not throwing errors when scanning
maybe-not-visible index tuples. The B-Tree opclass guarantees might
not even be written down anywhere -- they seem like common sense,
almost.

What you've described definitely seems like it could be very useful,
but I don't think that it solves the fundamental problem with cases
like the BitmapOr plan. Even if you do things in a way that precludes
the possibility of extra heap page accesses (when the VM bit isn't
set), you still have the problem of "access predicates vs index filter
predicates". Which is a big problem, in and of itself.

> Anyway, I think there are always going to be clauses that would be safe
> to evaluate on the index, but the index AM does not know to handle them
> for some reason. For example it might require extending the AM to handle
> generic expressions, which doesn't seem quite desirable.

Actually, I mostly don't think we'd need to teach nbtree or other
index AMs anything about simplifying expressions. Structurally, we
should try to make things like ScalarArrayOpExpr into "just another
indexable operator", which has little to no difference with any other
indexable operator at runtime.

There probably are several good reasons why "normalizing to CNF" in
the planner is a good idea (to some extent we do this already). Alena
Rybakina's OR-to-SAOP transformation patch was written well before
anybody knew about the MDAM/SAOP patch I'm working on. The original
motivation was to lower expression evaluation overhead.

You could probably find a third of even a fourth reason to do that
specific transformation, if you thought about it for a while. Top-down
planner designs such as Cascades really have to spend a lot of time on
this kind of normalization process. For very general reasons -- many
of which are bound to apply in our own bottom-up planner design.

> So I think I see three points where we could evaluate expressions:
>
> 1) in the AM, as access preditates / index quals (doing this more often
>    is kinda what the SAOP patches aim to do)
>
> 2) in the index scan, before checking VM / visibility (if the qual is
>    safe to be evaluated early)
>
> 3) in the index scan, after checking VM / visibility (if the expression
>    does unsafe things)

Agreed.

Presumably there would also be a class of expressions that the patch
should make into index filters rather than table filters, despite
being unable to determine whether they're safe to evaluate early. Even
if there is only a small chance of it helping at runtime, there is no
reason (or infinitesimally small reason) to not to just do prefer
index filters where possible -- so it's desirable to always prefer
index filters, regardless of opclass/type restrictions on early
evaluation. Right?

Assuming the answer is yes, then I think that you still need all of
the index-only scan stuff that can "fallback to heap access", just to
be able to cover this other class of expression. I don't think that
this class that I've described will be rarely encountered, or
anything.

> I agree. That seems like a discussion relevant to the general topic of
> "upgrading" clauses. If anything, the patch may need to worry about
> moving table filters to index filters, that's the only thing it does.

Obviously that will have indirect consequences due to the changes in
the costing.

> > It is always better to make what could be an "index filter" into an
> > index qual. Of course, the current practical problem for you is
> > figuring out how to deal with that in cases like the BitmapOr case.
> > Since it's not as if the planner is wrong, exactly...it really is the
> > cheapest plan, so the planner is at least right on its own terms. I am
> > interested in finding a solution to that problem.
> >
>
> Well, if the planner is not wrong, what solution are we looking for? ;-)

I imagine that you really don't want to have to rely on some
wishy-washy philosophical argument about the planner's expectation
being the only reasonable basis for choosing a plan. Just as I don't
want to have to rely on some similarly hand-wavy argument about risk.
The ideal outcome is one that doesn't require any of that, from either
of us.

I believe that the patch that I'm working on can allow us to totally
avoid it. I hesitate to say this, since it might sound like I'm
imposing conditions in a self-interested way. AFAICT it really does
provide us with a practical way of just not having to go down the road
that nobody wants to go down. So I am just about ready to say that I
believe that that will end up being the solution we use. It just seems
to make sense.

By normalizing to CNF, the planner is given the ability to work with
higher-level index paths, that abstract-away inessential "physical
plan" differences. Suppose, for example, we're building index paths
for a scan that comes from an SAOP that was generated through OR
transformation. But, it's an index AM that lacks native support for
SAOPs -- not nbtree. That index path will still end up using a
BitmapOr, in the end, since it'll ultimately have to compensate for
the lack of index AM infrastructure. So the final "physical plan" will
be exactly the same as today -- the OR transformation will actually
have changed nothing about the physical plan in these sorts of cases.

The CNF transformation process just puts what was already true ("these
two plans are logically equivalent") on a more formal footing -- and
so implicitly avoids "risky plans" like the one we've discussed. We'll
only be relying on the nbtree work to get those small efficiencies
that come from avoiding duplicative primitive index scans. Since that
was never actually a goal of your patch to begin with, limiting that
benefit to nbtree scans (where we can do it without novel risks) seems
more than acceptable.

Since you're not relying on the nbtree work at all here, really (just
on the transformation process itself), the strategic risk that this
adds to your project isn't too great. It's not like this ties the
success of your patch to the success of my own patch. At most it ties
the success of your patch to something like Alena Rybakina's
OR-to-SAOP transformation patch, which seems manageable to me. (To be
clear, I'm not relying on that work in the same way myself -- for my
patch the transformation process is just a nice-to-have.)

> FWIW if the problem is the patch may make the new plan look cheaper than
> some "actually better" plan (e.g. the BitmapOr one). In that case, we
> could just keep the old costing (kinda assuming the worst case, but as
> you said, the benefits are limited, while the risks are arbitrary).
> That's the only idea I have.

That's almost ideal for my patch. I should be able to pursue my patch
without concern about your patch interfering too much -- presumably my
patch will be able to generate the cheapest plan in cases like the
BitmapOr case. It'll at least be slightly cheaper in physical reality,
and now you'll artificially penalize the paths that your patch
generates -- so cases like the BitmapOr case should do the right thing
by seeing the SAOP path as cheaper during planning.

But is that approach really ideal for your patch? I doubt it. Why
wouldn't we want to give the index paths with index filters credit for
being cheaper in almost all cases? That's why this doesn't feel like a
costing issue to me (it's more of a "just don't do that" issue, I
think). Your patch seems too important to nerf like this, even if it's
convenient in some ways.

--
Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Use of additional index columns in rows filtering