Re: apply_scanjoin_target_to_paths and partitionwise join
| От | Robert Haas | 
|---|---|
| Тема | Re: apply_scanjoin_target_to_paths and partitionwise join | 
| Дата | |
| Msg-id | CA+TgmoafrY-=M=Dgr-sVkETCefa2yLFAVADTVLa=JKBwpjtLoA@mail.gmail.com обсуждение исходный текст  | 
		
| Ответ на | Re: apply_scanjoin_target_to_paths and partitionwise join (Robert Haas <robertmhaas@gmail.com>) | 
| Ответы | 
                	
            		Re: apply_scanjoin_target_to_paths and partitionwise join
            		
            		 Re: apply_scanjoin_target_to_paths and partitionwise join  | 
		
| Список | pgsql-hackers | 
On Thu, Oct 30, 2025 at 11:52 AM Robert Haas <robertmhaas@gmail.com> wrote: > Right. Although that's the main thing here, I am inclined to suspect > there are other ways to hit this problem, maybe ways that are more > likely to happen in the real world, because... And just like that, I found another way that this can happen. Consider this query from the regression tests: SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b; Each of prt1 and prt2 have three partitions. Since t1.a = t2.a and t1.a = t2.b, the planner deduces that t2.a = t2.b. The only rows from t2 where a = b are in the first partition. The planner therefore estimates that if it just does a Merge Join between the two partitioned tables, the Merge Join will stop early. There are actually nine rows in t2 where a = b, but the planner's estimate is just three rows, so it's understandable that it will very quickly run out of rows on the t2 side of the join. Thus, in the non-partitionwise plan, while its estimated cost to scan t1 is 44.83 and its estimated cost to scan t2 is 5.55, the estimated cost of the join is only 7.99, reflecting the fact that it doesn't anticipate having to actually finish the scan of t1. Now, if it does a partitionwise join, it still picks a Merge Join for the prt1_p1/prt2_p1 sub-join, and that can still stop early. But for the prt1_p2/prt2_p2 and prt1_p3/prt2_p3 joins, it picks hash joins, which as far as the planner knows can't stop early, so there's no opportunity to get a "discount" on the cost of scanning any of those tables. As a result, the estimated cost of this plan ends up being 11.53, clearly more than the non-partitionwise estimated cost. In this case, the planner's methodology doesn't really make a lot of sense when you stop to think about it. If the planner is correct that the non-partitionwise join will stop early, then the hash joins that are chosen in the partitionwise scan will stop early, because the non-parallel hashjoin code notices when the hash table built for the inner side is empty and bails out without finishing the outer scan. But the planner code is understandably reluctant to bet on a table being completely empty for costing purposes. Performing a non-partitionwise join allows the planner to make an end run around this restriction: it can bet on the combined inner table ending up with no rows contributed by the second or third child tables without actually betting on any relation being completely empty. Consequently, it doesn't seem like this type of case can account for the original report of a massive real-world run time regression. The planner's mental gymnastics here cause it to think that a non-partitionwise plan will be faster, but as far as I can tell, there's no real reason to expect that it actually will be. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: