Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT
От | Thomas F.O'Connell |
---|---|
Тема | Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT |
Дата | |
Msg-id | 474D5EBE-45CA-11D9-B555-000D93AE0944@sitening.com обсуждение исходный текст |
Ответ на | Re: Poor Performance with Distinct Subqueries with EXISTS and EXCEPT (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>) |
Список | pgsql-general |
Interestingly, I tried the new version with and without enable_seqscan on, and the version without indexes performs better because, I think, it returns more rows than an index lookup would enhance. Thanks again for your help. This is certainly an improvement over my original version. -tfo -- Thomas F. O'Connell Co-Founder, Information Architect Sitening, LLC http://www.sitening.com/ 110 30th Avenue North, Suite 6 Nashville, TN 37203-6320 615-260-0005 On Dec 2, 2004, at 6:42 AM, Pierre-Frédéric Caillaud wrote: > Let's re-take your query from the start. At each step you should > explain analyze the query to check if it runs smoothly. > > 1. You want the messages which have no actions. Rather than a > subselect, I'd use a LEFT JOIN : > > untested syntax : > SELECT m.id FROM message m LEFT JOIN message_action ma ON > m.id=ma.messages_id WHERE ma.messages_id IS NULL; > > On my machine, I have a zones table with 3000 rows and a cities table > with 2 million rows, each place having a zone_id : > > EXPLAIN ANALYZE SELECT z.zone_id FROM geo.zones z LEFT JOIN geo.cities > c ON c.zone_id=z.zone_id WHERE c.id IS NULL; > Merge Left Join (cost=0.00..142063.06 rows=3663 width=4) (actual > time=8726.203..8726.203 rows=0 loops=1) > Merge Cond: ("outer".zone_id = "inner".zone_id) > Filter: ("inner".id IS NULL) > -> Index Scan using zones_pkey on zones z (cost=0.00..99.10 > rows=3663 width=4) (actual time=15.027..43.987 rows=3663 loops=1) > -> Index Scan using cities_zones_idx on cities c > (cost=0.00..116030.55 rows=2073935 width=8) (actual > time=25.164..5823.496 rows=2073935 loops=1) > Total runtime: 8726.327 ms > (6 lignes) > > 8 seconds, this gives you an idea with that many records. > You should check your indexes are used ! > > Now you have the messages which have no actions, you must get the > user email domains : > > SELECT split_part( u.email, '@', 2 ) as domain > FROM users u, message m > LEFT JOIN message_action ma ON m.id=ma.messages_id > WHERE u.id=m.user_id > AND ma.messages_id IS NULL; > > Can you time this query ? Are the indexes used ? > Now, let's remove the duplicates : > > SELECT split_part( u.email, '@', 2 ) as domain > FROM users u, message m > LEFT JOIN message_action ma ON m.id=ma.messages_id > WHERE u.id=m.user_id > AND ma.messages_id IS NULL > GROUP By domain; > > GROUP BY is faster than DISTINCT (in some cases). > > How does it go ?
В списке pgsql-general по дате отправления: