Re: JIT compilation per plan node

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: JIT compilation per plan node
Дата
Msg-id CA+TgmoaXvnxwT4D8WvH-km7WRMo=_vCXrf1m9vb3TT5N0RiWtA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: JIT compilation per plan node  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: JIT compilation per plan node  (David Rowley <dgrowleyml@gmail.com>)
Re: JIT compilation per plan node  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Tue, Feb 20, 2024 at 5:31 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I certainly agree that the current JIT costing is quite crude, and we've
> all seen cases where the decision turns out to not be great. And I think
> the plan to make the decisions at the node level makes sense, so +1 to
> that in general.

Seems reasonable to me also.

> And I think you're right that looking just at the node total cost may
> not be sufficient - that we may need a better cost model, considering
> how many times an expression is executed and so on. But I think we
> should try to do this in smaller steps, meaningful on their own,
> otherwise we won't move at all. The two threads linked by Melih are ~4y
> old and *nothing* changed since then, AFAIK.
>
> I think it's reasonable to start by moving the decision to the node
> level - it's where the JIT happens, anyway. It may not be perfect, but
> it seems like a clear improvement. And if we then choose to improve the
> "JIT cost model" to address some of the issues you pointed out, surely
> that would need to happen at the node level too ...

I'm not sure I understand whether you (Tomas) think that this patch is
a good idea or a bad idea as it stands. I read the first of these two
paragraphs to suggest that the patch hasn't really evolved much in the
last few years, perhaps suggesting that if it wasn't good enough to
commit back then, it still isn't now. But the second of these two
paragraphs seems more supportive.

From my own point of view, I definitely agree with David's statement
that what we really want to know is how many times each expression
will be evaluated. If we had that information, or just an estimate, I
think we could make much better decisions in this area. But we don't
have that infrastructure now, and it doesn't seem easy to create, so
it seems to me that what we have to decide now is whether applying a
cost threshold on a per-plan-node basis will produce better or worse
results than what making one decision for the whole plan. David's
provided an example of where it does indeed work better back in
https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C%2BksKFpSdZg%3Dq6sTbtQ-v%3Daw%40mail.gmail.com
- but could there be enough cases where the opposite happens to make
us think that the patch is overall a bad idea?

I personally find that a bit unlikely, although not impossible. I see
a couple of ways that using the per-node cost can distort things -- it
seems like it will tend to heavily feature JIT for "interior" plan
nodes because the cost of a plan node includes it's children -- and as
was mentioned previously, it doesn't really care whether the node cost
is high because of expression evaluation or something else. But
neither of those things seem like they'd be bad enough to make this a
bad way forward over all. For the patch to lose, it seems like we'd
need a case where the overall plan cost would have been high enough to
trigger JIT pre-patch, but most of the benefit would have come from
relatively low-cost nodes that don't get JITted post-patch. The
easiest way for that to happen is if the planner's estimates are off,
but that's not really an argument against this patch as much as it is
an argument that query planning is hard in general.

A slightly subtler way the patch could lose is if the new threshold is
harder to adjust than the old one. For example, imagine that you have
a query that does a Cartesian join. That makes the cost of the input
nodes rather small compared to the cost of the join node, and it also
means that JITting the inner join child in particular is probably
rather important. But if you set join_above_cost low enough to JIT
that node post-patch, then maybe you'll also JIT a bunch of things
that aren't on the inner side of a nested loop and which might
therefore not really need JIT. Unless I'm missing something, this is a
fairly realistic example of where this patch's approach to costing
could turn out to be painful ... but it's not like the current system
is pain-free either.

I don't really know what's best here, but I'm mildly inclined to
believe that the patch might be a change for the better. I have not
reviewed the implementation and have no comment on whether it's good
or bad from that point of view.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Possibility to disable `ALTER SYSTEM`
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: broken JIT support on Fedora 40