Re: Unclamped row estimates whith OR-ed subplans

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Unclamped row estimates whith OR-ed subplans
Дата
Msg-id 1729591.1592590609@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Unclamped row estimates whith OR-ed subplans  ("Benjamin Coutu" <ben.coutu@zeyos.com>)
Ответы Re: Unclamped row estimates whith OR-ed subplans  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-performance
"Benjamin Coutu" <ben.coutu@zeyos.com> writes:
> I don't want to waste your time but maybe there is room for improvement as both "account" and "contract" are highly
distinctand the individual subplan estimates are quite accurate: 

Yeah, as I said, the estimates you're getting for the OR'd subplans are
pretty stupid.  Once you throw the OR in there, it's not possible to
convert the IN clauses to semi-joins, so they just stay as generic
subplans.  It looks like we have exactly zero intelligence about the
generic case --- unless I'm missing something in clause_selectivity,
you just end up with a default 0.5 selectivity estimate.  So yeah,
there's a lot of room for improvement, whenever anyone finds some
round tuits to work on that.

While you're waiting, you might think about recasting the query to
avoid the OR.  Perhaps you could do a UNION of two scans of the
transactions table?

> Btw, I don't quite understand why the nested loop on contract only is expected to yield 31662 rows, when the
null_fracof field transactions.contract is 1. Shouldn't that indicate zero rows or some kind of default minimum
estimatefor that query? 

That I don't understand.  I get a minimal rowcount estimate for an
all-nulls outer table, as long as I'm using just one IN rather than
an OR:

regression=# create table contracts (id int);
CREATE TABLE
regression=# insert into contracts values(1),(2),(3),(4);
INSERT 0 4
regression=# analyze contracts ;
ANALYZE
regression=# create table transactions (contract int);
CREATE TABLE
regression=# insert into transactions select null from generate_series(1,100000);
INSERT 0 100000
regression=# analyze transactions;
ANALYZE
regression=# explain select * from transactions where contract in (select id from contracts);
                                QUERY PLAN
--------------------------------------------------------------------------
 Hash Semi Join  (cost=1.09..1607.59 rows=1 width=4)
   Hash Cond: (transactions.contract = contracts.id)
   ->  Seq Scan on transactions  (cost=0.00..1344.00 rows=100000 width=4)
   ->  Hash  (cost=1.04..1.04 rows=4 width=4)
         ->  Seq Scan on contracts  (cost=0.00..1.04 rows=4 width=4)
(5 rows)

            regards, tom lane



В списке pgsql-performance по дате отправления:

Предыдущее
От: "Benjamin Coutu"
Дата:
Сообщение: Re: Unclamped row estimates whith OR-ed subplans
Следующее
От: Michael Lewis
Дата:
Сообщение: Re: Unclamped row estimates whith OR-ed subplans