[HACKERS] <> join selectivity estimate question
От | Thomas Munro |
---|---|
Тема | [HACKERS] <> join selectivity estimate question |
Дата | |
Msg-id | CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: [HACKERS] <> join selectivity estimate question
|
Список | pgsql-hackers |
Hi hackers, While studying a regression reported[1] against my parallel hash join patch, I noticed that we can also reach a good and a bad plan in unpatched master. One of the causes seems to be the estimated selectivity of a semi-join with an extra <> filter qual. Here are some times I measured for TPCH Q21 at scale 10 and work_mem of 1GB. That is a query with a large anti-join and a large semi-join. 8 workers = 8.3s 7 workers = 8.2s 6 workers = 8.5s 5 workers = 8.9s 4 workers = 9.5s 3 workers = 39.7s 2 workers = 36.9s 1 worker = 38.2s 0 workers = 47.9s Please see the attached query plans showing the change in plan from Hash Semi Join to Nested Loop Semi Join that happens only once we reach 4 workers and the (partial) base relation size becomes smaller. The interesting thing is that row estimate for the semi-join and anti-join come out as 1 (I think this is 0 clamped to 1). The same thing can be seen with a simple semi-join, if you happen to have TPCH loaded. Compare these two queries: SELECT * FROM lineitem l1 WHERE EXISTS (SELECT * FROM lineitem l2 WHERE l1.l_orderkey = l2.l_orderkey); -> estimates 59986012 rows, actual rows 59,986,052 (scale 10 TPCH) SELECT * FROM lineitem l1 WHERE EXISTS (SELECT * FROM lineitem l2 WHERE l1.l_orderkey = l2.l_orderkey AND l1.l_suppkey <> l2.l_suppkey); -> estimates 1 row, actual rows 57,842,090 (scale 10 TPCH) Or for a standalone example: CREATE TABLE foo AS SELECT (generate_series(1, 1000000) / 4)::int AS a, (generate_series(1, 1000000) % 100)::int AS b; ANALYZE foo; SELECT * FROM foo f1 WHERE EXISTS (SELECT * FROM foo f2 WHERE f1.a = f2.a); -> estimates 1,000,000 rows SELECT * FROM foo f1 WHERE EXISTS (SELECT * FROM foo f2 WHERE f1.a = f2.a AND f1.b <> f2.b); -> estimates 1 row I'm trying to wrap my brain around the selectivity code, but am too green to grok how this part of the planner that I haven't previously focused on works so far, and I'd like to understand whether this is expected behaviour so that I can figure out how to tackle the reported regression with my patch. What is happening here? Thanks for reading. [1] https://www.postgresql.org/message-id/CAEepm%3D3Og-7-b3WOkiT%3Dc%2B6y3eZ0VVSyb1K%2BSOvF17BO5KAt0A%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: