[HACKERS] Re: Improve OR conditions on joined columns (common star schemaproblem)
От | Jim Nasby |
---|---|
Тема | [HACKERS] Re: Improve OR conditions on joined columns (common star schemaproblem) |
Дата | |
Msg-id | 3605fcb2-6622-6520-b678-83171fe4c9d8@openscg.com обсуждение исходный текст |
Ответ на | [HACKERS] Re: Improve OR conditions on joined columns (common star schemaproblem) (David Steele <david@pgmasters.net>) |
Ответы |
Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
|
Список | pgsql-hackers |
On 3/16/17 11:54 AM, David Steele wrote: > On 2/14/17 4:03 PM, Tom Lane wrote: >> Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >>> On 2/14/17 1:18 PM, Tom Lane wrote: >>>> One point that could use further review is whether the de-duplication >>>> algorithm is actually correct. I'm only about 95% convinced by the >>>> argument I wrote in planunionor.c's header comment. >> >>> I'll put some thought into it and see if I can find any holes. Are you >>> only worried about the removal of "useless" rels or is there more? >> >> Well, the key point is whether it's really OK to de-dup on the basis >> of only the CTIDs that are not eliminated in any UNION arm. I was >> feeling fairly good about that until I thought of the full-join-to- >> left-join-to-no-join conversion issue mentioned in the comment. >> Now I'm wondering if there are other holes; or maybe I'm wrong about >> that one and it's not necessary to be afraid of full joins. > > This patch applies cleanly (with offsets) and compiles at cccbdde. > > Jim, have you had time to think about this? Any insights? Having thought about it, I share Tom's concerns. And now I'm worried about what happens if there are multiple separate OR clauses. I guess those would be handled by separate UNIONs? I'm also finding it a bit hard to follow the comment Tom mentioned. I'm pretty sure I understand what's going on until this part: > The identical proof can be expected to apply > + * in other arms, except in an arm that references that rel in its version > + * of the OR clause. But in such an arm, we have effectively added a > + * restriction clause to what is known in other arms, which means that the > + * set of rows output by that rel can't increase compared to other arms. AIUI, this is describing a case something like this: SELECT child.blah FROM child LEFT JOIN parent USING(parent_id) WHERE child.foo AND (child.baz=1 or child.baz=2) given that parent.parent_id is unique. Except for these concerns, there would need to be a complex OR somewhere in here that sometimes referenced parent and sometimes didn't, such as WHERE child.foo AND (child.baz=1 OR parent.foo=3) But I'm not following the logic here (very possibly because I'm wrong about what I said above): > + * Therefore the situation in such an arm must be that including the rel > + * could result in either zero or one output row, rather than exactly one > + * output row as in other arms. So we still don't need to consider it for > + * de-duplication. I'm definitely not certain about the rest of it. Tom, could you expand the description some, especially with some examples? -- Jim Nasby, Chief Data Architect, Austin TX OpenSCG http://OpenSCG.com
В списке pgsql-hackers по дате отправления: