Re: faster way to calculate top "tags" for a "resource" based on a column
От | Jonathan Vanasco |
---|---|
Тема | Re: faster way to calculate top "tags" for a "resource" based on a column |
Дата | |
Msg-id | F6D7998D-44FD-495E-8900-CCDE561999F1@2xlp.com обсуждение исходный текст |
Ответ на | Re: faster way to calculate top "tags" for a "resource" based on a column (Marc Mamin <M.Mamin@intershop.de>) |
Список | pgsql-general |
On Oct 7, 2014, at 10:02 AM, Marc Mamin wrote: > Hi, > it seems to me that your subquery may deliver duplicate ids. > And with the selectivity of your example, I would expect an index usage > instead of a table scan. You may check how up to date your statistics are > and try to raise the statistic target on the column resource_2_tag.tag_id. > Also try a CTE form for your query: It shouldn't be able to deliver duplicate ids. => SELECT COUNT(*) FROM (SELECT DISTINCT id FROM resource WHERE resource_attribute1_id = 614) AS foo; count ------- 5184 => SELECT COUNT(*) FROM (SELECT id FROM resource WHERE resource_attribute1_id = 614) AS foo; count ------- 5184 However, adding in the DISTINCT drastically changed the query plan, and did give a speedup. Your comment made me focus on the notion of a Table Scan. I assumed it did the seq scan - and there would not be much savingsotherwise - because the table is just 2 ids. I was wrong. I noticed that I never put a PRIMARY KEY constraint on that table. So i tried adding a PRIMARY KEY constraint, then running vacuum analyze... And that solved all my problems. the original query ended up being the fastest at 260ms ( down from 1760 ) Join - 260ms Subquery w/DISTINCT - 300ms CTE - 330 CTE w/DISTINCT - 345ms Subquery (no DISTINCT) - 1500ms
В списке pgsql-general по дате отправления: