Re: [pgsql-hackers-win32] Poor Performance for large queries
От | John Meinel |
---|---|
Тема | Re: [pgsql-hackers-win32] Poor Performance for large queries |
Дата | |
Msg-id | 415AE65E.3030200@johnmeinel.com обсуждение исходный текст |
Ответ на | Re: [pgsql-hackers-win32] Poor Performance for large queries in functions (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
Tom Lane wrote: > [ enlarging on Richard's response a bit ] > > John Meinel <john@johnmeinel.com> writes: > >>jfmeinel=> explain analyze execute myget(30000); >> QUERY PLAN >>-------------------------------------------------------------------- >> Seq Scan on tdata (cost=0.00..9773.10 rows=181923 width=4) >> (actual time=1047.000..1047.000 rows=0 loops=1) >> Filter: (project_id = $1) >> Total runtime: 1047.000 ms > > >>jfmeinel=> explain analyze select id from tdata where project_id = 30000; >> QUERY PLAN > > >>------------------------------------------------------------------------- >> Index Scan using tdata_project_id_idx on tdata (cost=0.00..4.20 >>rows=1 width=4) (actual time=0.000..0.000 rows=0 loops =1) >> Index Cond: (project_id = 30000) >> Total runtime: 0.000 ms > > >>So notice that when doing the actual select it is able to do the index >>query. But for some reason with a prepared statement, it is not able to >>do it. > > > This isn't a "can't do it" situation, it's a "doesn't want to do it" > situation, and it's got nothing whatever to do with null or not null. > The issue is the estimated row count, which in the first case is so high > as to make the seqscan approach look cheaper. So the real question here > is what are the statistics on the column that are making the planner > guess such a large number when it has no specific information about the > compared-to value. Do you have one extremely common value in the column? > Have you done an ANALYZE recently on the table, and if so can you show > us the pg_stats row for the column? > > regards, tom lane > The answer is "yes" that particular column has very common numbers in it. Project id is a number from 1->21. I ended up modifying my query such that I do the bulk of the work in a regular UNION SELECT so that all that can be optimized, and then I later do another query for this row in an 'EXECUTE ...' so that unless I'm actually requesting a small number, the query planner can notice that it can do an indexed query. I'm pretty sure this is just avoiding worst case scenario. Because it is true that if I use the number 18, it will return 500,000 rows. Getting those with an indexed lookup would be very bad. But typically, I'm doing numbers in a very different range, and so the planner was able to know that it would not likely find that number. Thanks for pointing out what the query planner was thinking, I was able to work around it. John =:->
Вложения
В списке pgsql-performance по дате отправления: