Re: Subquery in a JOIN not getting restricted?
От | Jay Levitt |
---|---|
Тема | Re: Subquery in a JOIN not getting restricted? |
Дата | |
Msg-id | 4EBAE4F2.1090403@gmail.com обсуждение исходный текст |
Ответ на | Re: Subquery in a JOIN not getting restricted? ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Ответы |
Re: Subquery in a JOIN not getting restricted?
(Gavin Flower <GavinFlower@archidevsys.co.nz>)
|
Список | pgsql-performance |
Kevin Grittner wrote: > Jay Levitt<jay.levitt@gmail.com> wrote: > >> I don't get why the GROUP BY in this subquery forces it to scan >> the entire users table (seq scan here, index scan on a larger >> table) when there's only one row in users that can match: > Are you sure there's a plan significantly faster than 1.3 ms? Yep! Watch this: drop schema if exists jaytest cascade; create schema jaytest; set search_path to jaytest; create table questions ( id int not null primary key, user_id int not null ); insert into questions select generate_series(1,1100), (random()*2000000)::int; create table users ( id int not null primary key ); insert into users select generate_series(1, 2000000); vacuum freeze analyze; explain analyze select questions.id from questions join ( select u.id from users as u group by u.id ) as s on s.id = questions.user_id where questions.id = 1; ----------------------- Merge Join (cost=8.28..90833.02 rows=1818 width=4) (actual time=888.787..888.790 rows=1 loops=1) Merge Cond: (u.id = questions.user_id) -> Group (cost=0.00..65797.47 rows=2000000 width=4) (actual time=0.017..735.509 rows=1747305 loops=1) -> Index Scan using users_pkey on users u (cost=0.00..60797.47 rows=2000000 width=4) (actual time=0.015..331.990 rows=1747305 loops=1) -> Materialize (cost=8.28..8.29 rows=1 width=8) (actual time=0.013..0.015 rows=1 loops=1) -> Sort (cost=8.28..8.28 rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1) Sort Key: questions.user_id Sort Method: quicksort Memory: 25kB -> Index Scan using questions_pkey on questions (cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1) Index Cond: (id = 1) Total runtime: 888.832 ms (11 rows) explain analyze select questions.id from questions join ( select u.id from users as u ) as s on s.id = questions.user_id where questions.id = 1; ----------------------- Nested Loop (cost=0.00..16.77 rows=1 width=4) (actual time=0.019..0.021 rows=1 loops=1) -> Index Scan using questions_pkey on questions (cost=0.00..8.27 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1) Index Cond: (id = 1) -> Index Scan using users_pkey on users u (cost=0.00..8.49 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1) Index Cond: (u.id = questions.user_id) Total runtime: 0.045 ms (6 rows) > That said, there might be some room for an optimization which pushes > that test into the query with the "group by" clause. I don't know > if there's a problem with that which I'm missing, the construct was > judged to be too rare to be worth the cost of testing for it, or > it's just that nobody has yet gotten to it. Anyone have more insights on whether this is hard to optimize or simply not-yet-optimized? And if the latter, where might I start looking? (Not that you -really- want me to submit a patch; my C has regressed to the "try an ampersand. OK, try an asterisk." level...) Jay
В списке pgsql-performance по дате отправления: