Re: query planner woes
От | Tom Lane |
---|---|
Тема | Re: query planner woes |
Дата | |
Msg-id | 25238.1086882800@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | query planner woes (Dmitry Karasik <dmitry@karasik.eu.org>) |
Список | pgsql-novice |
Dmitry Karasik <dmitry@karasik.eu.org> writes: > I've constructed a simple query which takes too long > to finish. > mts=# explain analyze select * > mts-# from recipients,addresses > mts-# where addresses.address ~ '@cat' and > mts-# recipients.msgto = addresses.id; > QUERY > PLAN > ----------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=535.70..1603.96 rows=352 width=63) > (actual time=88.387..422.922 rows=576 loops=1) > Hash Cond: ("outer".msgto = "inner".id) > -> Seq Scan on recipients (cost=0.00..737.42 rows=43642 width=24) > (actual time=0.012..178.258 rows=43642 > loops=1) > ^^^^^ > -> Hash (cost=535.34..535.34 rows=147 width=39) > (actual time=87.866..87.866 rows=0 loops=1) > -> Seq Scan on addresses (cost=0.00..535.34 rows=147 width=39) > (actual time=0.958..87.806 rows=3 loops=1) > Filter: (address ~ '@cat'::text) > Total runtime: 425.061 ms > (7 rows) Given that there are actually only 3 rows in addresses matching '@cat', it would have been best to use a plan like Nestloop Seq Scan on addresses Index Scan on recipients Index Cond: msgto = outer.id ie, use the index to visit each of those three recipients. However, for the 147 rows that the planner was expecting, it's not clear that the hash plan choice is wrong; and with an even-less-selective address filter, the hash plan will considerably beat the nestloop. So ISTM that the real problem is the inaccuracy of the estimate about how many rows will be pulled from addresses. Unfortunately there's not a lot you can do to really solve that problem for arbitrary pattern-match queries :-(. The planner has no stats that would let it derive a non-guess estimate. Depending on what your real workload will be like, you might be able to formulate things so that common cases go faster. For instance, I gather that what you're doing here is looking at the site portion of email addresses? If you do that a lot, it might be worth splitting the site portion out and storing it as a separate column. Then the search becomes an anchored-left match: ... where addresses.site ~ '^cat' which the planner has significantly better chance of handling well. regards, tom lane
В списке pgsql-novice по дате отправления: