Re: Strange statistics
От | Henrik |
---|---|
Тема | Re: Strange statistics |
Дата | |
Msg-id | 7A5A80E8-8F03-40C2-B2B8-9E70AE89DE04@mac.se обсуждение исходный текст |
Ответ на | Re: Strange statistics (Joris Dobbelsteen <joris@familiedobbelsteen.nl>) |
Список | pgsql-general |
3 jun 2008 kl. 23.31 skrev Joris Dobbelsteen: > Henrik wrote: >> Hi list, >> I'm having a table with a lots of file names in it. (Aprox 3 >> million) in a 8.3.1 db. >> Doing this simple query shows that the statistics is way of but I >> can get them right even when I raise the statistics to 1000. >> db=# alter table tbl_file alter file_name set statistics 1000; >> ALTER TABLE >> db=# analyze tbl_file; >> ANALYZE >> db=# explain analyze select * from tbl_file where lower(file_name) >> like lower('to%'); >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------- Bitmap >> Heap Scan on tbl_file (cost=23.18..2325.13 rows=625 width=134) >> (actual time=7.938..82.386 rows=17553 loops=1) >> Filter: (lower((file_name)::text) ~~ 'to%'::text) >> -> Bitmap Index Scan on tbl_file_idx (cost=0.00..23.02 rows=625 >> width=0) (actual time=6.408..6.408 rows=17553 loops=1) >> Index Cond: ((lower((file_name)::text) ~>=~ 'to'::text) AND >> (lower((file_name)::text) ~<~ 'tp'::text)) >> Total runtime: 86.230 ms >> (5 rows) >> How can it be off by a magnitude of 28?? > > These are statistics and represent an only estimate! In this case, > the planner seems to be doing the right thing(tm) anyway. > > Statistics is a frequently misunderstood subject and usually > provides excellent material to draw plain wrong conclusions. There > is a good chance that due to the physical layout of your data, the > algorithms in the statistics collector, the existence of uncertainty > and some more unknown factors your statistics will be biased. This > is a situations where you noticed it. > > Running "SELECT * FROM pg_stats;" will give you the statistics the > planner uses and can provide some hints to why the planner has > chosen these estimates. > Probably statistics will vary between ANALYZE runs. Its also > possible to try "CLUSTER" and friends. Try different queries and > look at the deviations. Thanks Joris for your input. You are the second person that suggests CLUSTER for me. Maybe I should take a look. The problem is that our select queries are kinda random. Would CLUSTER help then also? Should I just CLUSTER on the moste used index or? Thanks /henke > > > All in all, you should really start worrying when the planner starts > planning inefficient queries. Since its a filename, it might be > highly irregular (random) and a low statistics target might be good > enough anyways. > > Unfortunately I'm not a statistics expert... > > - Joris
В списке pgsql-general по дате отправления: