Обсуждение: A performance regression issue with Memoize

Поиск
Список
Период
Сортировка

A performance regression issue with Memoize

От
Richard Guo
Дата:
I ran into a query that shows a performance regression related to the
Memoize node.

create table t (a int, b int, c int);
insert into t select i%3, i, i from generate_series(1,500)i;
analyze t;

explain (analyze, costs off, timing off)
select * from t t1 join lateral
  (select t2.a, t2.b, t1.a x from t t2, t t3 offset 0) s
  on s.a = t1.a;

with enable_memoize set to on:

 Planning Time: 2.470 ms
 Execution Time: 98869.240 ms

with enable_memoize set to off:

 Planning Time: 1.791 ms
 Execution Time: 55754.080 ms

This shows a 77.3% performance regression with Memoize enabled.

The stats of the Memoize node shows some clues:

   ->  Memoize (actual rows=83334.00 loops=500)
         Cache Key: t1.a
         Cache Mode: binary
         Hits: 0  Misses: 500  Evictions: 498  Overflows: 0  Memory
Usage: 8193kB

There are 0 cache hits, and too many cache evictions.

So I suspect that during the phase of filling the Memoize cache, the
memory usage exceeds the specified limit, causing cache entries to be
repeatedly evicted.

While cost_memoize_rescan() does account for the eviction ratio when
estimating the cost of Memoize, the estimate does not seem to be
accurate enough in this case to prevent the planner from choosing a
Memoize node.

Any thoughts on how we might improve this?

Thanks
Richard



Re: A performance regression issue with Memoize

От
David Rowley
Дата:
On Mon, 28 Jul 2025 at 20:13, Richard Guo <guofenglinux@gmail.com> wrote:
> create table t (a int, b int, c int);
> insert into t select i%3, i, i from generate_series(1,500)i;
> analyze t;
>
> explain (analyze, costs off, timing off)
> select * from t t1 join lateral
>   (select t2.a, t2.b, t1.a x from t t2, t t3 offset 0) s
>   on s.a = t1.a;
>
> with enable_memoize set to on:
>
>  Planning Time: 2.470 ms
>  Execution Time: 98869.240 ms
>
> with enable_memoize set to off:
>
>  Planning Time: 1.791 ms
>  Execution Time: 55754.080 ms

There are 2 reasons why this goes bad; 1) The row estimate is bad from
the subquery, and the Memoize costs are fooled into thinking more
items will fit in the cache than actually can fit. 2) The order that
the Memoize lookups are performed by this query matches exactly with
the last-to-be-looked-up-first-to-be-evicted method of freeing up
memory that Memoize uses and since not all items fit in the cache,
there's never any hits.

For #1, if you run with costs on, you'll see the row estimates are
pretty far out:

->  Subquery Scan on s  (cost=0.00..6267.25 rows=1250 width=12)
(actual time=0.115..52.816 rows=83334.00 loops=500)

This causes the Memoize costing code to think only 1250 rows need to
be stored per entry and that results in the costing code thinking that
all 3 unique keys can be stored concurrently. If you fix the row
estimates the Memoize costing will estimate that only 1 item will fit
in the cache, which is correct. Unfortunately, with the 3 distinct
items to lookup, Memoize still thinks that there will be a cache hit
with about every 1 in 3 lookups. That unfortunately doesn't pan out
due to the table being populated with "i%3", meaning the lookups will
have the lookup key arrive in sequences of 0, 1, 2, 0, 1, 2, etc. The
MRU evictions follow the same pattern, i.e. populating for 1 will
evict 0 and populating with 2 will evict 1, etc, resulting in zero
hits.

The current Memoize costing code assumes the lookups keys will be
evenly distributed (let's call this the average case), and when that
happens, you can see that Memoize is a win for this case:

explain analyze
select * from (select * from t order by random()) t1 join lateral
(select t2.a, t2.b, t1.a x from t t2, t t3 offset 0) s
on s.a = t1.a;

Running that I get:

Hits: 334  Misses: 166  Evictions: 164  Overflows: 0  Memory Usage: 8193kB
Execution Time: 9446.671 ms

And with memoize disabled:

Execution Time: 16547.240 ms

Having the keys in order with ORDER BY a (let's call this the best
case), you can see it does even better:

Hits: 497  Misses: 3  Evictions: 1  Overflows: 0  Memory Usage: 8193kB
Execution Time: 4041.162 ms

and about the same 16.5 seconds as above when memoize is disabled.

> Any thoughts on how we might improve this?

If we were to tune the costs for the worst-case, then we'd basically
always predict a zero percent hit ratio in all cases where there are
more distinct keys than will fit in the cache. If we did that, that
would penalise cases that get huge benefits today.

I have recently been thinking of statistics that represent the
min/max/mean distribution of values within a table and indexes
according to the order they're read in a forward scan. Perhaps
something like that could help assist the memoize costs so we have a
better idea. It might help in some cases, but certainly wouldn't be
perfect for all cases.

Another idea would be to add run-time detection in Memoize to remember
some of the most recent evicted keys, or perhaps their hash value so
that if we find we always lookup a recently evicted entry, then we
could decide to remember that one rather than continue to evict it.
That would obviously take > 0 memory, which eats into the memory for
caching tuples. I don't have a detailed idea of how this would work.
It sounds like it would be a most-commonly-used rather than a
most-recently-used cache.

On the whole, I don't really see this as a flaw in the Memoize code.
We've plenty of other cases in the planner that produce inferior plans
due to lack of enough detail or accuracy of table statistics, so I'm
not planning on rushing to look into a fix. I will keep it in mind,
however.

David



Re: A performance regression issue with Memoize

От
Richard Guo
Дата:
On Tue, Jul 29, 2025 at 6:30 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On the whole, I don't really see this as a flaw in the Memoize code.
> We've plenty of other cases in the planner that produce inferior plans
> due to lack of enough detail or accuracy of table statistics, so I'm
> not planning on rushing to look into a fix. I will keep it in mind,
> however.

Yeah, I agree that this issue isn't limited to the Memoize node; it's
a more general problem.  The optimizer can sometimes choose plans that
are suboptimal by orders of magnitude due to inaccurate statistics.

One way to improve this is by improving the accuracy of statistics,
but that can be very difficult or even impossible in some cases,
especially when dealing with attribute correlations or highly skewed
data distributions.

Another possible direction is to support runtime plan correction or
feedback loops.  We've always followed a "plan-first, execute-next"
approach so far.  But perhaps we could extend that by monitoring plan
execution and triggering re-optimization when the executor detects
that actual result sizes or other runtime statistics differ
significantly from the estimates.  In recent years, there have been
more and more papers and research on adaptive query processing.  It
might be worth considering how PostgreSQL could support such
techniques in the future.

Thanks
Richard



Re: A performance regression issue with Memoize

От
David Rowley
Дата:
On Tue, 29 Jul 2025 at 16:01, Richard Guo <guofenglinux@gmail.com> wrote:
> Another possible direction is to support runtime plan correction or
> feedback loops.  We've always followed a "plan-first, execute-next"
> approach so far.  But perhaps we could extend that by monitoring plan
> execution and triggering re-optimization when the executor detects
> that actual result sizes or other runtime statistics differ
> significantly from the estimates.  In recent years, there have been
> more and more papers and research on adaptive query processing.  It
> might be worth considering how PostgreSQL could support such
> techniques in the future.

I've recently noticed that some databases are doing things like this.
[1] is an example. For the record, I've absolutely no insight into
what's going on there aside from what's mentioned in the public
documentation. In any case, I don't think that idea is new on us as
there's been discussion before about having some sort of hybrid hash
join / nested loop before in regards to trying to fix the issue with
extra large batches during hash joins.

If we were to adopt something similar, I believe we'd need to have
some sort of estimate on the certainty of the statistics we're working
with, otherwise, how would you know when and when not to use the
adaptive method?  There's also the PathKey issue when switching
algorithms. For example, nested loops preserve the outer path's order,
but multi-batch hash joins do not. That may not be an issue when
switching a method that's more strict in terms of row output order,
but it could be when going the other way. That means you don't get the
full flexibility to adapt the plan as you'd get from having the
planner choose the new plan in the first place.

For the record, I 100% agree that there will always be cases where
statistics are just unable to represent what is discovered at
run-time, so having some sort of ability to adapt at run-time seems
like a natural progression on the evolutionary chain. I just don't
know if it's the best or best next step to make. I suspect we might be
skipping a few steps from what we have now if we went there in the
near future. We don't yet have extended statistics for joins yet, for
example.

David

[1] https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?view=sql-server-ver17#adaptive



Re: A performance regression issue with Memoize

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> For the record, I 100% agree that there will always be cases where
> statistics are just unable to represent what is discovered at
> run-time, so having some sort of ability to adapt at run-time seems
> like a natural progression on the evolutionary chain. I just don't
> know if it's the best or best next step to make. I suspect we might be
> skipping a few steps from what we have now if we went there in the
> near future. We don't yet have extended statistics for joins yet, for
> example.

Yeah.  There is plenty remaining to be done towards collecting and
applying traditional sorts of statistics.  I worry about ideas
such as run-time plan changes because we will have exactly zero
ability to predict what'll happen if the executor starts doing
that.  Maybe it'll be great, but what do you do if it isn't?

            regards, tom lane



Re: A performance regression issue with Memoize

От
Andrei Lepikhov
Дата:
On 29/7/2025 06:30, David Rowley wrote:
> On Tue, 29 Jul 2025 at 16:01, Richard Guo <guofenglinux@gmail.com> wrote:
>> Another possible direction is to support runtime plan correction or
>> feedback loops.  We've always followed a "plan-first, execute-next"
>> approach so far.  But perhaps we could extend that by monitoring plan
>> execution and triggering re-optimization when the executor detects
>> that actual result sizes or other runtime statistics differ
>> significantly from the estimates.  In recent years, there have been
>> more and more papers and research on adaptive query processing.  It
>> might be worth considering how PostgreSQL could support such
>> techniques in the future.
> 
> I've recently noticed that some databases are doing things like this.
> [1] is an example. For the record, I've absolutely no insight into
> what's going on there aside from what's mentioned in the public
> documentation. In any case, I don't think that idea is new on us as
> there's been discussion before about having some sort of hybrid hash
> join / nested loop before in regards to trying to fix the issue with
> extra large batches during hash joins.
We are constantly struggling with multi-level join trees, where 
estimations usually quickly end up with a 'rows=1' estimation and switch 
to Nested Loop on the upper levels, causing troubles during execution. 
Hence, we have attempted to play with the HybridJoin idea for a couple 
of years.
Designing a new execution node seems painful and hard to support. So, we 
have chosen an alternative way of preparing two separate subquery trees 
for the suspicious (an empirics inevitably involved) join and passing 
them to the executor. Implementation is simple - it is stored in 
CustomScan subpaths/subplans. It appears to be an Append, but our Custom 
node has the flexibility to choose any of the subtrees and switch to an 
alternative if no tuples are given to the upper level. You may find some 
demonstration of the approach here [1].
Right now it works with NL -> HJ switching, but it is not hard to add 
MergeJoin as well.
I think similar approach could be discussed for use in the core.

> 
> If we were to adopt something similar, I believe we'd need to have
> some sort of estimate on the certainty of the statistics we're working
> with, otherwise, how would you know when and when not to use the
> adaptive method?  There's also the PathKey issue when switching
> algorithms. For example, nested loops preserve the outer path's order,
> but multi-batch hash joins do not. That may not be an issue when
> switching a method that's more strict in terms of row output order,
> but it could be when going the other way. That means you don't get the
> full flexibility to adapt the plan as you'd get from having the
> planner choose the new plan in the first place.
Yep, PathKey issue really exists and we just build Sort->HJ alternative 
path if costs say that sorted NL dominates.
> 
> For the record, I 100% agree that there will always be cases where
> statistics are just unable to represent what is discovered at
> run-time, so having some sort of ability to adapt at run-time seems
> like a natural progression on the evolutionary chain. I just don't
> know if it's the best or best next step to make. I suspect we might be
> skipping a few steps from what we have now if we went there in the
> near future. We don't yet have extended statistics for joins yet, for
> example.
We ended up with a solution when we calculated a 'turning number of 
tuples' in the outer at the optimisation stage. If the outer provides 
even one more tuple, we immediately switch to the more conservative 
version of the plan at runtime.

[1] 

https://www.pgevents.ca/events/pgconfdev2025/schedule/session/346-switching-between-query-plans-in-real-time-switch-join/

-- 
regards, Andrei Lepikhov



Re: A performance regression issue with Memoize

От
Robert Haas
Дата:
On Tue, Jul 29, 2025 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > For the record, I 100% agree that there will always be cases where
> > statistics are just unable to represent what is discovered at
> > run-time, so having some sort of ability to adapt at run-time seems
> > like a natural progression on the evolutionary chain. I just don't
> > know if it's the best or best next step to make. I suspect we might be
> > skipping a few steps from what we have now if we went there in the
> > near future. We don't yet have extended statistics for joins yet, for
> > example.
>
> Yeah.  There is plenty remaining to be done towards collecting and
> applying traditional sorts of statistics.  I worry about ideas
> such as run-time plan changes because we will have exactly zero
> ability to predict what'll happen if the executor starts doing
> that.  Maybe it'll be great, but what do you do if it isn't?

Well, you already know that what you're doing isn't great. If the
currently-selected alternative is terrible, the other alternative
doesn't have to be that great to be a win.

I've thought about this mostly in the context of the decision between
a Nested Loop and a Hash Join. Subject to some conditions, these are
interchangeable: at any point you could decide that on the next
iteration you're going to put all the inner rows into a hash table and
just probe that. The "only" downside is that it could turn out that,
unluckily, the next iteration was also the last one that was ever
going to happen, and then the overhead to build the hash table was
wasted. If the Nested Loop is parameterized, the Hash Join requires a
complete scan of the inner side of the join, which requires a
different plan variant, and which is potentially quite expensive.

But switching from a plain Nested Loop to Nested Loop + Memoize
wouldn't have that problem. You never have to make a complete scan of
the inner side. You can just decide to start caching some results for
individual parameter values whenever you want, and if it turns out
that they're never useful, you haven't lost nearly as much. So a
strategy like "start memoizing when we exceed the expected loop count
by 20x" might be viable. I'm not really sure, I haven't done the
experiments, but it seems to me that the downsides of this kind of
strategy switch might be pretty minimal even when things work out
anti-optimally.

--
Robert Haas
EDB: http://www.enterprisedb.com