Re: Hybrid Hash/Nested Loop joins and caching results from subplans
От | David Rowley |
---|---|
Тема | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Дата | |
Msg-id | CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Hybrid Hash/Nested Loop joins and caching results from subplans (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
|
Список | pgsql-hackers |
On Tue, 30 Jun 2020 at 11:57, David Rowley <dgrowleyml@gmail.com> wrote: > For now, I'm planning on changing things around a little in the Result > Cache node to allow faster deletions from the cache. As of now, we > must perform 2 hash lookups to perform a single delete. This is > because we must perform the lookup to fetch the entry from the MRU > list key, then an additional lookup in the hash delete code. I plan > on changing the hash delete code to expose another function that > allows us to delete an item directly if we've already looked it up. > This should make a small reduction in the overheads of the node. > Perhaps if the overhead is very small (say < 1%) when the cache is of > no use then it might not be such a bad thing to just have a Result > Cache for correlated subplans regardless of estimates. With the TPCH > Q20 test, it appeared as if the overhead was 0.27% for that particular > subplan. A more simple subplan would execute more quickly resulting > the Result Cache overhead being a more significant portion of the > overall subquery execution. I'd need to perform a worst-case overhead > test to get an indication of what the percentage is. I made the changes that I mention to speedup the cache deletes. The patch is now in 3 parts. The first two parts are additional work and the final part is the existing work with some small tweaks. 0001: Alters estimate_num_groups() to allow it to pass back a flags variable to indicate if the estimate used DEFAULT_NUM_DISTINCT. The idea here is to try and avoid using a Result Cache for a Nested Loop join when the statistics are likely to be unreliable. Because DEFAULT_NUM_DISTINCT is 200, if we estimate that number of distinct values then a Result Cache is likely to look highly favourable in some situations where it very well may not be. I've not given this patch a huge amount of thought, but so far I don't see anything too unreasonable about it. I'm prepared to be wrong about that though. 0002 Makes some adjustments to simplehash.h to expose a function which allows direct deletion of a hash table element when we already have a pointer to the bucket. I think this is a pretty good change as it reuses more simplehash.h code than without the patch. 0003 Is the result cache code. I've done another pass over this version and fixed a few typos and added a few comments. I've not yet added support for LATERAL joins. I plan to do that soon. For now, I just wanted to get something out there as I saw that the patch did need rebased. I did end up testing the overheads of having a Result Cache node on a very simple subplan that'll never see a cache hit. The overhead is quite a bit more than the 0.27% that we saw with TPCH Q20. Using a query that gets zero cache hits: $ cat bench.sql select relname,(select oid from pg_class c2 where c1.oid = c2.oid) from pg_Class c1 offset 1000000000; enable_resultcache = on: $ pgbench -n -f bench.sql -T 60 postgres latency average = 0.474 ms tps = 2110.431529 (including connections establishing) tps = 2110.503284 (excluding connections establishing) enable_resultcache = off: $ pgbench -n -f bench.sql -T 60 postgres latency average = 0.379 ms tps = 2640.534303 (including connections establishing) tps = 2640.620552 (excluding connections establishing) Which is about a 25% overhead in this very simple case. With more complex subqueries that overhead will drop significantly, but for that simple one, it does seem a quite a bit too high to be adding a Result Cache unconditionally for all correlated subqueries. I think based on that it's worth looking into the AlternativeSubPlan option that I mentioned earlier. I've attached the v2 patch series. David
Вложения
В списке pgsql-hackers по дате отправления: