Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
От | Dimitrios Apostolou |
---|---|
Тема | Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions |
Дата | |
Msg-id | 6a2f3906-3d7a-6924-7403-8f77d57a18e4@gmx.net обсуждение исходный текст |
Ответ на | Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions |
Список | pgsql-general |
On Fri, 10 May 2024, Tom Lane wrote: > Dimitrios Apostolou <jimis@gmx.net> writes: >> Further digging into this simple query, if I force the non-parallel plan >> by setting max_parallel_workers_per_gather TO 0, I see that the query >> planner comes up with a cost much higher: > >> Limit (cost=363.84..1134528847.47 rows=10 width=4) >> -> Unique (cost=363.84..22690570036.41 rows=200 width=4) >> -> Append (cost=363.84..22527480551.58 rows=65235793929 width=4) >> ... > >> The total cost on the 1st line (cost=363.84..1134528847.47) has a much >> higher upper limit than the total cost when >> max_parallel_workers_per_gather is 4 (cost=853891608.79..853891608.99). >> This explains the planner's choice. But I wonder why the cost estimation >> is so far away from reality. > > I'd say the blame lies with that (probably-default) estimate of > just 200 distinct rows. That means the planner expects to have > to read about 5% (10/200) of the tables to get the result, and > that's making fast-start plans look bad. Indeed that's an awful estimate, the table has more than 1M of unique values in that column. Looking into pg_stat_user_tables, I can't see the partitions having been vacuum'd or analyzed at all. I think they should have been auto-analyzed, since they get a ton of INSERTs (no deletes/updates though) and I have the default autovacuum settings. Could it be that autovacuum starts, but never finishes? I can't find something in the logs. In any case, even after the planner decides to execute the terrible plan with the parallel seqscans, why doesn't it finish right when it finds 10 distinct values? > > Possibly an explicit ANALYZE on the partitioned table would help. Thanks, I'll save the ANALYZE as the last step; I feel it's a good opportunity to figure out more details about how postgres works. Plus I expect ANALYZE to last a couple of days, so I should first find quiet time for that. :-) Dimitris
В списке pgsql-general по дате отправления: