Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE
| От | David Rowley |
|---|---|
| Тема | Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE |
| Дата | |
| Msg-id | CAApHDvq78Xdjvq-9tAawNx74pUpDLNJHSH=riK3-gv5zv2ww_Q@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE (Sjors Gielen <sjors@sjorsgielen.nl>) |
| Список | pgsql-bugs |
On Tue, 4 Jun 2024 at 02:50, Sjors Gielen <sjors@sjorsgielen.nl> wrote: > I am not sure I am using the correct list for this - please let me know if I > should move this to pgsql-hackers or pgsql-performance. This is the correct place to report bugs. > Should the correct plan not be simply this? > > -> Nested Loop > -> CTE Scan on last_semester_per_student (cost=0.00..440.92 rows=22046 width=8) (actual time=9.947..13.157 rows=21826loops=1) > -> Index Only Scan using idx2 on public.student_courses (cost=0.29..3.54 rows=6 width=8) (actual time=0.001..0.001rows=6 loops=21826) You're right that the query planner injected a uniquification step into the plan unnecessarily. It did this to allow the semi-join to be transformed into a join so it could swap the join order. We won't class this issue as a bug as there is no code in PostgreSQL that is malfunctioning. We simply don't have anything in place in the current versions of PostgreSQL to detect which keys a Path is unique on which the planner could use to skip performing the uniquification step. > I am not certain, but it appears that the code generating this path might be in > pathnode.c create_unique_path, which already checks whether its child node > yields unique rows, but only supports relations or subqueries for this check, > not CTEs. Therefore, it falls back to a naive estimation which is off by a > factor of 100x. The row estimation part has been improved in the yet-to-be-released PostgreSQL 17. The improvement won't be backpatched into earlier versions. > Am I correct in thinking this code would be improved by supporting CTE > subqueries, as well, transporting the uniqueness information from the > CTE-producing node (in this case, the outer HashAggregate)? I'm unsure what you mean by the first part since CTEs already support that. For the "transporting the uniqueness information" part, yes, if the planner tracked that it could produce a plan that could be executed more efficiently. > Would a change of this nature be accepted upstream, if I were to put some time > and effort into it? There has been some effort in [1] to improve this. I've not had enough spare time to look into how that's progressing. If you're keen to see PostgreSQL produce better plans for this type of thing, you're very welcome to catch up with what's been done so far and assess how you can best assist with progressing this further. Showing interest on the thread and testing the latest patch is likely a very good start. David [1] https://postgr.es/m/7mlamswjp81p.fsf@e18c07352.et15sqa
В списке pgsql-bugs по дате отправления: