Re: Q: will GROUP BY make use of an index to return tuples early?
От | Tom Lane |
---|---|
Тема | Re: Q: will GROUP BY make use of an index to return tuples early? |
Дата | |
Msg-id | 6664.1023649026@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Q: will GROUP BY make use of an index to return tuples early? (Gunther Schadow <gunther@aurora.regenstrief.org>) |
Список | pgsql-sql |
Gunther Schadow <gunther@aurora.regenstrief.org> writes: > SELECT id, MIN(foo.time) > FROM Foo foo > GROUP BY foo.id; > can one tell the query executor to do an index scan on Foo_id_idx That is a considered plan type. For example, using the regression test database: regression=# set enable_seqscan TO 0; SET regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=0.00..1121.89 rows=1000 width=8) (actual time=19.12..1619.71 rows=100 loops=1) -> Group (cost=0.00..1096.89 rows=10000width=8) (actual time=0.56..1484.14 rows=10000 loops=1) -> Index Scan using tenk1_hundred on tenk1 (cost=0.00..1071.89rows=10000 width=8) (actual time=0.53..1194.64 rows=10000 loops=1)Total runtime: 1621.69 msec (4 rows) regression=# set enable_seqscan TO 1; SET regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------Aggregate (cost=997.39..1072.39 rows=1000 width=8) (actual time=1162.52..1677.46 rows=100 loops=1) -> Group (cost=997.39..1047.39rows=10000 width=8) (actual time=1157.33..1565.75 rows=10000 loops=1) -> Sort (cost=997.39..1022.39rows=10000 width=8) (actual time=1157.29..1298.10 rows=10000 loops=1) Sort Key: hundred -> Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=8) (actual time=0.16..277.69 rows=10000 loops=1)Totalruntime: 1695.83 msec (6 rows) The sort-based plan usually comes out ahead in estimated total cost, but if you were using a cursor to read it then I believe the indexscan plan would get chosen due to its much lower startup cost. > If the answer is yes, there is a way, then how about if we do this: > CREATE INDEX Foo_id_time_idx ON Foo(id, time); > now, considering the same query, it could be executed even faster, > because we could do an index scan on Foo_id_time_idx and only need > to consider the first data tuple of every Foo.id group (because > the ordering of Foo_id_time_idx guarantees that the MIN(time) is > in the first tuple. This won't happen because (a) the optimizer treats aggregates as black boxes; there is no notion that some aggregates have a connection to the sort ordering of some indexes; (b) the indexscan would still have to pass over the other time values in each group. regards, tom lane
В списке pgsql-sql по дате отправления: