Re: Interesting performance behaviour
От | Tom Lane |
---|---|
Тема | Re: Interesting performance behaviour |
Дата | |
Msg-id | 3312.1094854197@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Interesting performance behaviour (Joey Smith <joeysmith@gmail.com>) |
Список | pgsql-performance |
Joey Smith <joeysmith@gmail.com> writes: > EXPLAIN ANALYZE SELECT message_idnr FROM messages WHERE mailbox_idnr > = 1746::bigint AND status<2::smallint AND seen_flag = 0 AND unique_id > != '' ORDER BY message_idnr ASC LIMIT 1; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..848.36 rows=1 width=8) (actual > time=1173.949..1173.953 rows=1 loops=1) > -> Index Scan using messages_pkey on messages > (cost=0.00..367338.15 rows=433 width=8) (actual > time=1173.939..1173.939 rows=1 loops=1) > Filter: ((mailbox_idnr = 1746::bigint) AND (status < > 2::smallint) AND (seen_flag = 0) AND ((unique_id)::text <> ''::text)) > Total runtime: 1174.012 ms The planner is correctly estimating that this plan is very expensive overall --- but it is guessing that the indexscan will only need to be run 1/433'd of the way to completion before the single required row is found. So that makes it look like a slightly better bet than the more conventional indexscan-on-mailbox_idnr-and-then-sort plan. If you ask for a few more than one row, though, it stops looking like a good bet, since each additional row is estimated to cost another 1/433'd of the total cost. Part of the estimation error is that there are only 56 matching rows not 433, so the real cost-per-row ought to be 1/56'th of the total indexscan cost. I suspect also that there is some correlation between message_idnr and mailbox_idnr, which results in having to scan much more than the expected 1/56'th of the index before finding a matching row. The planner has no stats about intercolumn correlation so it's not going to be able to recognize the correlation risk, but if you could get the rowcount estimate closer to reality that would be enough to tilt the scales to the better plan. Increasing ANALYZE's stats target for mailbox_idnr would be worth trying. Also, I suspect that there is a strong correlation between seen_flag and status, no? This again is something you can't expect the planner to realize directly, but you might be able to finesse the problem (and save some storage as well) if you could merge the seen_flag into the status column and do just one comparison to cover both conditions. regards, tom lane
В списке pgsql-performance по дате отправления: