Re: Delay Memoize hashtable build until executor run
От | David Rowley |
---|---|
Тема | Re: Delay Memoize hashtable build until executor run |
Дата | |
Msg-id | CAApHDvp+5gdgc4Znjc3AT2J_Q6py1MjPQ=kPjMH1KaL5PVnZUQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Delay Memoize hashtable build until executor run (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Delay Memoize hashtable build until executor run
|
Список | pgsql-hackers |
On Fri, 26 Jan 2024 at 19:54, David Rowley <dgrowleyml@gmail.com> wrote: > Currently, nodeMemoize.c builds the hashtable for the cache during > executor startup. This is not what is done in hash joins. I think we > should make the two behave the same way. I ran a few benchmarks on this, mostly for archive purposes. -- Test 1: Demonstrate there is a problem drop table if exists t,r; create table t (a int); create table r (a int primary key); insert into t select x%5000000 from generate_Series(1,20000000)x; insert into r select x from generate_Series(0,4999999)x; vacuum analyze t,r; set work_mem='1GB'; set enable_hashjoin=0; set enable_mergejoin=0; set max_parallel_workers_per_gather=0; \timing on explain (summary on) select count(*) from t inner join r on t.a=r.a; set enable_memoize=1; -- I'm including Planning Time just to show that the extra time is not spent in planning Planning Time: 0.094 ms -> Time: 53.061 ms Planning Time: 0.093 ms -> Time: 53.064 ms Planning Time: 0.095 ms -> Time: 69.682 ms set enable_memoize=0; Planning Time: 0.113 ms -> Time: 0.438 ms Planning Time: 0.111 ms -> Time: 0.436 ms Planning Time: 0.113 ms -> Time: 0.445 ms Conclusion: There's a problem -- Patched with memoize on Planning Time: 0.116 ms -> Time: 0.472 ms Planning Time: 0.118 ms -> Time: 0.444 ms Planning Time: 0.117 ms -> Time: 0.443 ms Conclusion: The patch fixes the problem -- Test 2: Make sure we're not slowing things down by checking the table exists each tuple drop table if exists t,r; create table t (a int); create table r (a int primary key); insert into t select 1 from generate_series(1,1000000); insert into r select x from generate_series(1,1000000)x; vacuum analyze t,r; set enable_hashjoin=0; set enable_mergejoin=0; set enable_memoize=1; set max_parallel_workers_per_gather=0; -- only 1 cache miss so that we hammer the cache hit code as hard as we can -- with the smallest hash table possible so lookups are very fast. explain (analyze, timing off) select count(*) from t inner join r on t.a=r.a; -- Master Execution Time: 206.403 ms Execution Time: 211.472 ms Execution Time: 204.688 ms -- Patched Execution Time: 205.967 ms Execution Time: 206.406 ms Execution Time: 205.061 ms Conclusion: No slowdown. I'll push this change to master only as there don't seem to have been any complaints. We can reconsider that if someone complains. David
В списке pgsql-hackers по дате отправления: