Re: An "obvious" index not being used
От | Daniele Varrazzo |
---|---|
Тема | Re: An "obvious" index not being used |
Дата | |
Msg-id | 4859C24F.6000503@develer.com обсуждение исходный текст |
Ответ на | Re: An "obvious" index not being used (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
Tom Lane ha scritto: > Daniele Varrazzo <piro@develer.com> writes: >> There is an index in the field "foo.account_id" but is not used. The resulting >> query plan is: > >> Aggregate (cost=300940.70..300940.71 rows=1 width=0) (actual >> time=13412.088..13412.089 rows=1 loops=1) >> -> Hash IN Join (cost=11.97..299858.32 rows=432953 width=0) (actual >> time=0.678..13307.074 rows=92790 loops=1) >> Hash Cond: (foo.account_id = accounts.id) >> -> Seq Scan on foo (cost=0.00..275591.14 rows=5313514 width=4) >> (actual time=0.014..7163.538 rows=5313514 loops=1) > > Well, if the estimate of 432953 rows selected were correct, it'd be > right not to use the index. Fetching one row in ten is not a chore > for an indexscan. (I'm not sure it'd prefer an indexscan even with an > accurate 92K-row estimate, but at least you'd be in the realm where > tweaking random_page_cost would make a difference.) Let me guess: because the account tables has an estimated (and correct) guess of 22 records fetched out from 270 =~ 8%, it assumes that it will need to fetch the 8% of 5.3M records (which... yes, it matches the estimate of 433K). Well, this seems terribly wrong for this data set :( > I'm not sure why that estimate is so bad, given that you said you > increased the stats target on the table. Is there anything particularly > skewed about the distribution of the account IDs? Probably there is, in the sense that the relatively many accounts of 'abc' type are referred by relatively few records. In the plan for the hardcoded query the estimate is: -> Bitmap Index Scan on ifoo_x1 (cost=0.00..4115.67 rows=178308 width=0) (actual time=89.766..89.766 rows=92790 loops=1) which is actually more accurate. I suspect the foo.account_id statistical data are not used at all in query: the query planner can only estimate the number of accounts to look for, not how they are distributed in the referencing tables. It seems the only way to get the proper plan is to add a load of fake accounts! Well, I'd rather have the query executed in 2 times, in order to have the stats correctly used: this is the first time it happens to me. -- Daniele Varrazzo - Develer S.r.l. http://www.develer.com
В списке pgsql-performance по дате отправления: