Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
От | David Rowley |
---|---|
Тема | Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize |
Дата | |
Msg-id | CAApHDvotKW3wt36gqw2M0yjhN0jm9yQsEzLu6BCpUXwurH8Gpg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
|
Список | pgsql-bugs |
On Tue, 5 Oct 2021 at 20:05, David Rowley <dgrowleyml@gmail.com> wrote: > I'm currently thinking over what the best fix for this should be. My > current thinking is that all parameters that are from above the > Memoize that are required below the Memoize must be part of the Cache > Key. Currently, it seems, that's not the case and in the plan from the > attached plan, i1.id is being used below the Memoize but originates > from above it. > > I'll need to look into how exactly to determine around when > get_memoize_path() is called how we determine if there's any subplan > parameters that will be needed in the Memoize path's sub path that > originates above the memoize node. I was looking at fixes for this bug. I started out with seeing what would be required to make paraminfo_get_equal_hashops() find all the SubPlan parameters that are on the inner side of the join that come from a location other than the inner side of the join. The only way I could think on how to do that would be to invent some way to traverse Paths to look for vars from a given set of Relids. Right now we just have ways to walk expression trees (expression_tree_walker). That's not going to work here. Let's call this way "option 1". An alternative way to fix the problem would be just to flush the cache whenever a Param changes that is not part of the cache key. The Materalize node works a bit like this, except it'll just flush the cache whenever a param changes, since the Material cache is not parameterized. One fairly hefty drawback with doing it this way is that the planner does not get a chance to take into account the fact that the cache could be flushed quite often. This could result in a Nested Loop / Memoize join being chosen instead of a Hash or Merge Join when the hash or merge join would be much more favourable. Let's call this "option 2". For option 1, I don't really like the sound of adding ways to traverse paths in back branches. I'm also not too sure what the implication would be for Custom scans. With option 2, I'm a bit worried that this would cause Memoize to be costed much cheaper than it actually is due to the planner thinking the cache hit ratio will be much higher than it actually is. Can anyone think of a 3rd option? I've attached a patch for option 2. David
Вложения
В списке pgsql-bugs по дате отправления: