Обсуждение: pgsql: Add Result Cache executor node

Поиск
Список
Период
Сортировка

pgsql: Add Result Cache executor node

От
David Rowley
Дата:
Add Result Cache executor node

Here we add a new executor node type named "Result Cache".  The planner
can include this node type in the plan to have the executor cache the
results from the inner side of parameterized nested loop joins.  This
allows caching of tuples for sets of parameters so that in the event that
the node sees the same parameter values again, it can just return the
cached tuples instead of rescanning the inner side of the join all over
again.  Internally, result cache uses a hash table in order to quickly
find tuples that have been previously cached.

For certain data sets, this can significantly improve the performance of
joins.  The best cases for using this new node type are for join problems
where a large portion of the tuples from the inner side of the join have
no join partner on the outer side of the join.  In such cases, hash join
would have to hash values that are never looked up, thus bloating the hash
table and possibly causing it to multi-batch.  Merge joins would have to
skip over all of the unmatched rows.  If we use a nested loop join with a
result cache, then we only cache tuples that have at least one join
partner on the outer side of the join.  The benefits of using a
parameterized nested loop with a result cache increase when there are
fewer distinct values being looked up and the number of lookups of each
value is large.  Also, hash probes to lookup the cache can be much faster
than the hash probe in a hash join as it's common that the result cache's
hash table is much smaller than the hash join's due to result cache only
caching useful tuples rather than all tuples from the inner side of the
join.  This variation in hash probe performance is more significant when
the hash join's hash table no longer fits into the CPU's L3 cache, but the
result cache's hash table does.  The apparent "random" access of hash
buckets with each hash probe can cause a poor L3 cache hit ratio for large
hash tables.  Smaller hash tables generally perform better.

The hash table used for the cache limits itself to not exceeding work_mem
* hash_mem_multiplier in size.  We maintain a dlist of keys for this cache
and when we're adding new tuples and realize we've exceeded the memory
budget, we evict cache entries starting with the least recently used ones
until we have enough memory to add the new tuples to the cache.

For parameterized nested loop joins, we now consider using one of these
result cache nodes in between the nested loop node and its inner node.  We
determine when this might be useful based on cost, which is primarily
driven off of what the expected cache hit ratio will be.  Estimating the
cache hit ratio relies on having good distinct estimates on the nested
loop's parameters.

For now, the planner will only consider using a result cache for
parameterized nested loop joins.  This works for both normal joins and
also for LATERAL type joins to subqueries.  It is possible to use this new
node for other uses in the future.  For example, to cache results from
correlated subqueries.  However, that's not done here due to some
difficulties obtaining a distinct estimation on the outer plan to
calculate the estimated cache hit ratio.  Currently we plan the inner plan
before planning the outer plan so there is no good way to know if a result
cache would be useful or not since we can't estimate the number of times
the subplan will be called until the outer plan is generated.

The functionality being added here is newly introducing a dependency on
the return value of estimate_num_groups() during the join search.
Previously, during the join search, we only ever needed to perform
selectivity estimations.  With this commit, we need to use
estimate_num_groups() in order to estimate what the hit ratio on the
result cache will be.   In simple terms, if we expect 10 distinct values
and we expect 1000 outer rows, then we'll estimate the hit ratio to be
99%.  Since cache hits are very cheap compared to scanning the underlying
nodes on the inner side of the nested loop join, then this will
significantly reduce the planner's cost for the join.   However, it's
fairly easy to see here that things will go bad when estimate_num_groups()
incorrectly returns a value that's significantly lower than the actual
number of distinct values.  If this happens then that may cause us to make
use of a nested loop join with a result cache instead of some other join
type, such as a merge or hash join.  Our distinct estimations have been
known to be a source of trouble in the past, so the extra reliance on them
here could cause the planner to choose slower plans than it did previous
to having this feature.  Distinct estimations are also fairly hard to
estimate accurately when several tables have been joined already or when a
WHERE clause filters out a set of values that are correlated to the
expressions we're estimating the number of distinct value for.

For now, the costing we perform during query planning for result caches
does put quite a bit of faith in the distinct estimations being accurate.
When these are accurate then we should generally see faster execution
times for plans containing a result cache.  However, in the real world, we
may find that we need to either change the costings to put less trust in
the distinct estimations being accurate or perhaps even disable this
feature by default.  There's always an element of risk when we teach the
query planner to do new tricks that it decides to use that new trick at
the wrong time and causes a regression.  Users may opt to get the old
behavior by turning the feature off using the enable_resultcache GUC.
Currently, this is enabled by default.  It remains to be seen if we'll
maintain that setting for the release.

Additionally, the name "Result Cache" is the best name I could think of
for this new node at the time I started writing the patch.  Nobody seems
to strongly dislike the name. A few people did suggest other names but no
other name seemed to dominate in the brief discussion that there was about
names. Let's allow the beta period to see if the current name pleases
enough people.  If there's some consensus on a better name, then we can
change it before the release.  Please see the 2nd discussion link below
for the discussion on the "Result Cache" name.

Author: David Rowley
Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu
Tested-By: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/b6002a796dc0bfe721db5eaa54ba9d24fd9fd416

Modified Files
--------------
contrib/postgres_fdw/expected/postgres_fdw.out |   25 +-
contrib/postgres_fdw/sql/postgres_fdw.sql      |    2 +
doc/src/sgml/config.sgml                       |   24 +-
src/backend/commands/explain.c                 |  140 +++
src/backend/executor/Makefile                  |    1 +
src/backend/executor/execAmi.c                 |    5 +
src/backend/executor/execExpr.c                |  134 +++
src/backend/executor/execParallel.c            |   18 +
src/backend/executor/execProcnode.c            |   10 +
src/backend/executor/nodeResultCache.c         | 1137 ++++++++++++++++++++++++
src/backend/nodes/copyfuncs.c                  |   31 +
src/backend/nodes/outfuncs.c                   |   37 +
src/backend/nodes/readfuncs.c                  |   22 +
src/backend/optimizer/path/allpaths.c          |    4 +
src/backend/optimizer/path/costsize.c          |  148 +++
src/backend/optimizer/path/joinpath.c          |  214 +++++
src/backend/optimizer/plan/createplan.c        |   87 ++
src/backend/optimizer/plan/initsplan.c         |   41 +
src/backend/optimizer/plan/setrefs.c           |    9 +
src/backend/optimizer/plan/subselect.c         |    5 +
src/backend/optimizer/util/pathnode.c          |   71 ++
src/backend/optimizer/util/restrictinfo.c      |    3 +
src/backend/utils/misc/guc.c                   |   10 +
src/backend/utils/misc/postgresql.conf.sample  |    1 +
src/include/executor/executor.h                |    7 +
src/include/executor/nodeResultCache.h         |   31 +
src/include/lib/ilist.h                        |   19 +
src/include/nodes/execnodes.h                  |   66 ++
src/include/nodes/nodes.h                      |    3 +
src/include/nodes/pathnodes.h                  |   22 +
src/include/nodes/plannodes.h                  |   21 +
src/include/optimizer/cost.h                   |    1 +
src/include/optimizer/pathnode.h               |    7 +
src/test/regress/expected/aggregates.out       |    2 +
src/test/regress/expected/join.out             |  131 +--
src/test/regress/expected/partition_prune.out  |  243 ++---
src/test/regress/expected/resultcache.out      |  159 ++++
src/test/regress/expected/subselect.out        |   20 +-
src/test/regress/expected/sysviews.out         |    3 +-
src/test/regress/parallel_schedule             |    2 +-
src/test/regress/serial_schedule               |    1 +
src/test/regress/sql/aggregates.sql            |    2 +
src/test/regress/sql/join.sql                  |    2 +
src/test/regress/sql/partition_prune.sql       |    3 +
src/test/regress/sql/resultcache.sql           |   85 ++
45 files changed, 2823 insertions(+), 186 deletions(-)


Re: pgsql: Add Result Cache executor node

От
Tom Lane
Дата:
David Rowley <drowley@postgresql.org> writes:
> Add Result Cache executor node

You didn't seriously believe that EXPLAIN ANALYZE numbers
would be stable in the buildfarm, did you?

            regards, tom lane



Re: pgsql: Add Result Cache executor node

От
David Rowley
Дата:
On Thu, 1 Apr 2021 at 12:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <drowley@postgresql.org> writes:
> > Add Result Cache executor node
>
> You didn't seriously believe that EXPLAIN ANALYZE numbers
> would be stable in the buildfarm, did you?

If you look a bit harder you can probably see which ones I thought
would be stable and which ones I thought wouldn't based on the
explain_resultcache() function and what I pass the hide_hitmiss
parameter as.  It appears that I might have been wrong, but at least
the code should probably answer your question.

Thanks for highlighting.

David



Re: pgsql: Add Result Cache executor node

От
David Rowley
Дата:
On Thu, 1 Apr 2021 at 12:32, David Rowley <drowley@postgresql.org> wrote:
> Add Result Cache executor node

I'm not really sure why yet why many buildfarm members don't like this.

You can see below that the Index scan was executed 20 times (loops=20)

          ->  Result Cache (actual rows=1 loops=1000)
                Cache Key: t2.twenty
+               Hits: 0  Misses: 0  Evictions: Zero  Overflows: 0
Memory Usage: NkB
                Hits: 980  Misses: 20  Evictions: Zero  Overflows: 0
Memory Usage: NkB
                ->  Index Only Scan using tenk1_unique1 on tenk1 t1
(actual rows=1 loops=20)

So the cache surely must have been hit all but 20 times. The Index
Only Scan would only be executed during a cache miss.

It looks like it's just the stats that are being lost somewhere. I'm
really not sure why yet, so do stop the buildfarm from turning
completely red, I'll just revert this for now.

Unfortunately, that'll not help give me any ideas of why the stats are
getting lost.  They're just variable in the ResultCacheState. I don't
see any reason that the code that increments these during a hit or
miss could be missed and I don't see how the stats could get zeroed
between execution and EXPLAIN either.

I'll revert and take a bit of time to look at that a bit more closely.

David



Re: pgsql: Add Result Cache executor node

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Thu, 1 Apr 2021 at 12:32, David Rowley <drowley@postgresql.org> wrote:
>> Add Result Cache executor node

> I'm not really sure why yet why many buildfarm members don't like this.

Something that struck me is that the animals that stayed green were
almost exclusively Debian and macOS, while the unhappy ones were
uniformly not.  That pattern makes no sense --- why would Debian
act differently from other Linuxen? --- but it sure seems real.

[ looks a bit more... ]  Actually, most of those Debian machines
are Andres' test menagerie.  So maybe there's some commonality
of configuration of his animals that is connected to this.

Anyway, it looks like I can probably reproduce it on florican's
host, if you still need help understanding it tomorrow.  But
I remain really dubious that this can work at all.  Question:
have you tried that test under CLOBBER_CACHE_ALWAYS?

            regards, tom lane



Re: pgsql: Add Result Cache executor node

От
Tomas Vondra
Дата:

On 4/1/21 2:52 AM, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> On Thu, 1 Apr 2021 at 12:32, David Rowley <drowley@postgresql.org> wrote:
>>> Add Result Cache executor node
> 
>> I'm not really sure why yet why many buildfarm members don't like this.
> 
> Something that struck me is that the animals that stayed green were
> almost exclusively Debian and macOS, while the unhappy ones were
> uniformly not.  That pattern makes no sense --- why would Debian
> act differently from other Linuxen? --- but it sure seems real.
> 
> [ looks a bit more... ]  Actually, most of those Debian machines
> are Andres' test menagerie.  So maybe there's some commonality
> of configuration of his animals that is connected to this.
> 
> Anyway, it looks like I can probably reproduce it on florican's
> host, if you still need help understanding it tomorrow.  But
> I remain really dubious that this can work at all.  Question:
> have you tried that test under CLOBBER_CACHE_ALWAYS?
> 

I'd bet the failures are due to force_parallel_mode=regress. Notice that
it's actually *adding* one extra row with the stats, which could be
explained by stats from leader + worker. Not sure why some of the
numbers were not replaced by N, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Add Result Cache executor node

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> On 4/1/21 2:52 AM, Tom Lane wrote:
>> Anyway, it looks like I can probably reproduce it on florican's
>> host, if you still need help understanding it tomorrow.  But
>> I remain really dubious that this can work at all.  Question:
>> have you tried that test under CLOBBER_CACHE_ALWAYS?

> I'd bet the failures are due to force_parallel_mode=regress. Notice that
> it's actually *adding* one extra row with the stats, which could be
> explained by stats from leader + worker. Not sure why some of the
> numbers were not replaced by N, though.

That might well explain some of it, but not all those unhappy machines
are using force_parallel_mode --- florican isn't, for one.  Now that
I look at it, I'd bet florican's diffs [1] are a 32-bit vs 64-bit issue
and not directly related to what David wants to test at all.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-04-01%2000%3A28%3A12