Re: Prefetch the next tuple's memory during seqscans
От | Andres Freund |
---|---|
Тема | Re: Prefetch the next tuple's memory during seqscans |
Дата | |
Msg-id | 20221102172544.hoszrut7tfepc3dc@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Prefetch the next tuple's memory during seqscans (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Prefetch the next tuple's memory during seqscans
Re: Prefetch the next tuple's memory during seqscans |
Список | pgsql-hackers |
Hi, On 2022-11-01 20:00:43 -0700, Andres Freund wrote: > I suspect that prefetching in heapgetpage() would provide gains as well, at > least for pages that aren't marked all-visible, pretty common in the real > world IME. Attached is an experimental patch/hack for that. It ended up being more beneficial to make the access ordering more optimal than prefetching the tuple contents, but I'm not at all sure that's the be-all-end-all. I separately benchmarked pinning the CPU and memory to the same socket, different socket and interleaving memory. I did this for HEAD, your patch, your patch and mine. BEGIN; DROP TABLE IF EXISTS large; CREATE TABLE large(a int8 not null, b int8 not null default '0', c int8); INSERT INTOlarge SELECT generate_series(1, 50000000);COMMIT; server is started with local: numactl --membind 1 --physcpubind 10 remote: numactl --membind 0 --physcpubind 10 interleave: numactl --interleave=all --physcpubind 10 benchmark stared with: psql -qX -f ~/tmp/prewarm.sql && \ pgbench -n -f ~/tmp/seqbench.sql -t 1 -r > /dev/null && \ perf stat -e task-clock,LLC-loads,LLC-load-misses,cycles,instructions -C 10 \ pgbench -n -f ~/tmp/seqbench.sql -t 3 -r seqbench.sql: SELECT count(*) FROM large WHERE c IS NOT NULL; SELECT sum(a), sum(b), sum(c) FROM large; SELECT sum(c) FROM large; branch memory time s miss % head local 31.612 74.03 david local 32.034 73.54 david+andres local 31.644 42.80 andres local 30.863 48.05 head remote 33.350 72.12 david remote 33.425 71.30 david+andres remote 32.428 49.57 andres remote 30.907 44.33 head interleave 32.465 71.33 david interleave 33.176 72.60 david+andres interleave 32.590 46.23 andres interleave 30.440 45.13 It's cool seeing how doing optimizing heapgetpage seems to pretty much remove the performance difference between local / remote memory. It makes some sense that David's patch doesn't help in this case - without all-visible being set the tuple headers will have already been pulled in for the HTSV call. I've not yet experimented with moving the prefetch for the tuple contents from David's location to before the HTSV. I suspect that might benefit both workloads. Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: