Re: Experimenting with hash join prefetch
От | Thomas Munro |
---|---|
Тема | Re: Experimenting with hash join prefetch |
Дата | |
Msg-id | CA+hUKGKB=EWq+rj4NK8CX6ZqHZzuQYWSb9iDfYKzGN6-ejShUQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Experimenting with hash join prefetch (Andrey Borodin <x4mmm@yandex-team.ru>) |
Ответы |
Re: Experimenting with hash join prefetch
|
Список | pgsql-hackers |
On Sun, Oct 14, 2018 at 11:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > 14 окт. 2018 г., в 9:18, Thomas Munro <thomas.munro@enterprisedb.com> написал(а): > > > > + /* Prefetch the bucket for the next key */ > > + uint32 next_hash = hash_uint32(DatumGetInt32(keyval) + 1); > > + uint32 next_bucket = next_hash % hashtable->nbuckets; > > + __builtin_prefetch(&hashtable->buckets.unshared[next_bucket]); > > > +1, I also think that we should use __builtin_prefetch these days (years, actually). > Exactly after listening Anastassia Ailamaki's (author of referenced paper) talk on VLDB I've proposed to do that for B-tree[0], but did not really pursuit that idea. The above was obviously just a hard-coded hack that "knew" the next key would be n + 1. I've been wondering how you might abstract real look-ahead with the shiny new TupleTableSlot design. Here's a concept I came up with: ExecScrollSlot(slot, 1) -> bool, to try to look ahead to the next tuple if possible. I suppose there could be a kind of scrolling that actually consumes tuples (for true batch-mode tuple processing in tight inner loops, for example hash table build), and scrolling that merely peeks ahead (what I'm describing so far). I'm keen to hear other ideas about how that could look, because I know that "vector" and "batch" mode tuple processing are ideas that people have been bouncing around for a while. Flame away. POC patch attached. I never got around to figuring out why it breaks anti-joins (hence some regression test failures) or filling out various other important details (see commit message for a list), but I figured it was better on the mailing list than hiding in a drawer, anyway. Here is an example of times for a trivial join on my laptop. Note that this is prefetching only the probing phase, not while building which should also be possible. I didn't get around to trying deeper prefetch pipelines as discussed earlier, but those seemed to produce diminishing returns with hardcoded tests like in the earlier message. shared_buffers = '3GB' create table r as select generate_series(1, 40000000)::int i; create table s as select generate_series(1, 10000000)::int i; analyze; set max_parallel_workers_per_gather = 0; set work_mem = '2GB'; select pg_prewarm('r'::regclass); select pg_prewarm('s'::regclass); select count(*) from r join s using (i); Master: 00:14.230 Patched: 00:11.818 -- Thomas Munro https://enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: