Re: Sequential Scan with LIMIT
От | John Meinel |
---|---|
Тема | Re: Sequential Scan with LIMIT |
Дата | |
Msg-id | 417C23DC.1090407@johnmeinel.com обсуждение исходный текст |
Ответ на | Re: Sequential Scan with LIMIT (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
Tom Lane wrote: > John Meinel <john@johnmeinel.com> writes: > >>I was looking into another problem, and I found something that surprised >>me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.". >>Now "col" is indexed, by mytable has 500,000 rows, and 'myval' occurs >>maybe 100,000 times. Without the LIMIT, this query should definitely do >>a sequential scan. > > >>But with the LIMIT, doesn't it know that it will return at max 1 value, >>and thus be able to use the index? > > > But the LIMIT will cut the cost of the seqscan case too. Given the > numbers you posit above, about one row in five will have 'myval', so a > seqscan can reasonably expect to hit the first matching row in the first > page of the table. This is still cheaper than doing an index scan > (which must require reading at least one index page plus at least one > table page). > > The test case you are showing is probably suffering from nonrandom > placement of this particular data value; which is something that the > statistics we keep are too crude to detect. > > regards, tom lane You are correct about non-random placement. I'm a little surprised it doesn't change with values, then. For instance, # select count(*) from finst_t where store_id = 52; 13967 Still does a sequential scan for the "select id from..." query. The only value it does an index query for is 9605 which only has 1 row. It estimates ~18,000 rows, but that is still < 3% of the total data. This row corresponds to disk location where files can be found. So when a storage location fills up, generally a new one is created. This means that *generally* the numbers will be increasing as you go further in the table (not guaranteed, as there are multiple locations open at any one time). Am I better off in this case just wrapping my query with: set enable_seqscan to off; query set enable_seqscan to on; There is still the possibility that there is a better way to determine existence of a value in a column. I was wondering about something like: SELECT 1 WHERE EXISTS (SELECT id FROM finst_t WHERE store_id=52 LIMIT 1); Though the second part is the same, so it still does the sequential scan. This isn't critical, I was just trying to understand what's going on. Thanks for your help. John =:->
Вложения
В списке pgsql-performance по дате отправления: