Обсуждение: Re: should we have a fast-path planning for OLTP starjoins?
On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote: > This is a surprisingly common query pattern in OLTP applications, > thanks > to normalization. +1. Creating a small lookup table should be encouraged rather than penalized. Your test data includes a fact table with 10k rows and no index on the filter condition. In OLTP applications the fact table might often fit in memory, but I'd still expect it to have an index on the filter condition. That might not change your overall point, but I'm curious why you constructed the test that way? > There's a lot of stuff that could / should be improved on the current > patch. For (1) we might add support for more complex cases with > snowflake schemas [3] or with multiple fact tables. At the same time > (1) > needs to be very cheap, so that it does not regress every non- > starjoin > query. The patch only considers the largest table as the fact table, which is a good heuristic of course. However, I'm curious if other approaches might work. For instance, could we consider the table involved in the most join conditions to be the fact table? If you base it on the join conditions rather than the size of the table, then detection of the star join would be based purely on the query structure (not stats), which would be nice for predictability. > But the bigger question is whether it makes sense to have such fast- > path > modes for certain query shapes. We should explore what kinds of surprising cases it might create, or what maintenance headaches might come up with future planner changes. But the performance numbers you posted suggest that we should do something here. Regards, Jeff Davis
On 2/4/25 20:43, Jeff Davis wrote: > On Tue, 2025-02-04 at 15:00 +0100, Tomas Vondra wrote: >> This is a surprisingly common query pattern in OLTP applications, >> thanks >> to normalization. > > +1. Creating a small lookup table should be encouraged rather than > penalized. > > Your test data includes a fact table with 10k rows and no index on the > filter condition. In OLTP applications the fact table might often fit > in memory, but I'd still expect it to have an index on the filter > condition. That might not change your overall point, but I'm curious > why you constructed the test that way? > No particular reason. I think I intended to make it a lookup by PK (which would match the use case examples), and I forgot about that. But yeah, I would expect an index too. > >> There's a lot of stuff that could / should be improved on the current >> patch. For (1) we might add support for more complex cases with >> snowflake schemas [3] or with multiple fact tables. At the same time >> (1) >> needs to be very cheap, so that it does not regress every non- >> starjoin >> query. > > The patch only considers the largest table as the fact table, which is > a good heuristic of course. However, I'm curious if other approaches > might work. For instance, could we consider the table involved in the > most join conditions to be the fact table? > > If you base it on the join conditions rather than the size of the > table, then detection of the star join would be based purely on the > query structure (not stats), which would be nice for predictability. > Right, there may be other (possibly better) ways to detect the star join shape. I was thinking about also requiring for foreign keys on the join clauses - in DWH systems FKeys are sometimes omitted, which would break the heuristics, but in OLTP it's common to still have them. I think the cost of the heuristic will be an important metric - I don't know if the number of join conditions is more expensive to determine than what the patch does now, though. >> But the bigger question is whether it makes sense to have such fast- >> path >> modes for certain query shapes. > > We should explore what kinds of surprising cases it might create, or > what maintenance headaches might come up with future planner changes. > But the performance numbers you posted suggest that we should do > something here. > Yes, it seems like an interesting opportunity for starjoin queries. It's a pretty common query pattern, but it also happens to be very expensive to plan because the dimensions can be reordered almost arbitrarily. regards -- Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes: > On 2/4/25 20:43, Jeff Davis wrote: >> If you base it on the join conditions rather than the size of the >> table, then detection of the star join would be based purely on the >> query structure (not stats), which would be nice for predictability. > Right, there may be other (possibly better) ways to detect the star join > shape. I was thinking about also requiring for foreign keys on the join > clauses - in DWH systems FKeys are sometimes omitted, which would break > the heuristics, but in OLTP it's common to still have them. I think you need to insist on foreign keys. Otherwise you don't know whether the joins will eliminate fact-table rows. If that's a possibility then it's no longer sensible to ignore different join orders. I'm kind of imagining a planner rule like "if table X is joined to using a match of a foreign-key column to its PK (so that the join removes no rows from the other table) and there are not other restriction conditions on table X, then force X to be joined last. And if there are multiple such tables X, it doesn't matter what order they are joined in as long as they're last." The interesting thing about this is we pretty much have all the infrastructure for detecting such FK-related join conditions already. Possibly the join order forcing could be done with existing infrastructure too (by manipulating the joinlist). regards, tom lane
On 2/4/25 21:23, Tom Lane wrote: > Tomas Vondra <tomas@vondra.me> writes: >> On 2/4/25 20:43, Jeff Davis wrote: >>> If you base it on the join conditions rather than the size of the >>> table, then detection of the star join would be based purely on the >>> query structure (not stats), which would be nice for predictability. > >> Right, there may be other (possibly better) ways to detect the star join >> shape. I was thinking about also requiring for foreign keys on the join >> clauses - in DWH systems FKeys are sometimes omitted, which would break >> the heuristics, but in OLTP it's common to still have them. > > I think you need to insist on foreign keys. Otherwise you don't know > whether the joins will eliminate fact-table rows. If that's a > possibility then it's no longer sensible to ignore different join > orders. > Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many of these star join queries with LEFT JOIN too, and then the FKs are not needed. All you need is a PK / unique index on the other side. Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough to make this work for most cases, and then the rest would simply use the regular join order algorithm. I was thinking that if we allow the dimensions to eliminate rows in the fact table, we'd simply join them starting from the most selective ones. But that doesn't work if the joins might have different per-row costs (e.g. because some dimensions are much larger etc). Doing something smarter would likely end up fairly close to the regular join order algorithm ... > I'm kind of imagining a planner rule like "if table X is joined to > using a match of a foreign-key column to its PK (so that the join > removes no rows from the other table) and there are not other > restriction conditions on table X, then force X to be joined last. > And if there are multiple such tables X, it doesn't matter what > order they are joined in as long as they're last." > I think it'd need to be a bit smarter, to handle (a) snowflake schemas and (b) additional joins referencing the starjoin result. The (a) shouldn't be too hard, except that it needs to check the 'secondary dimension' is also joined by FK and has no restrictions, and then do that join later. For (b), I don't have numbers but I've seen queries that first do a starjoin and then add more data to that, e.g. by joining to a combination of attributes from multiple dimensions (think region + payment type). Or by joining to some "summary" table that does not have an explicit FK. Still, we could leave at least some of the joins until the very end, I guess. But even for the dimensions joined earlier the order does not really matter. I think (a) is something we should definitely handle. (b) is more a DWH/BI thing, not really an OLTP query (which is what this thread is about). > The interesting thing about this is we pretty much have all the > infrastructure for detecting such FK-related join conditions > already. Possibly the join order forcing could be done with > existing infrastructure too (by manipulating the joinlist). > Maybe, interesting. I've ruled out relying on the FKeys early in the coding, but I'm sure there's infrastructure the patch could use. It'd still need to check the transitive FK relationships for snowflake joins to work, ofc. Which is not something we need to consider right now. What kind of "manipulation" of the joinlist you have in mind? regards -- Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes: >> The interesting thing about this is we pretty much have all the >> infrastructure for detecting such FK-related join conditions >> already. Possibly the join order forcing could be done with >> existing infrastructure too (by manipulating the joinlist). > Maybe, interesting. I've ruled out relying on the FKeys early in the > coding, but I'm sure there's infrastructure the patch could use. It would be very sad to do that work twice in a patch that purports to reduce planning time. If it's done too late to suit you now, could we move it to happen earlier? > What kind of "manipulation" of the joinlist you have in mind? Right now, if we have four tables to join, we have a joinlist (A B C D). (Really they're integer relids, but let's use names here.) If we decide to force C to be joined last, it should be sufficient to convert this to ((A B D) C). If C and D both look like candidates for this treatment, we can make it be (((A B) C) D) or (((A B) D) C). This is pretty much the same thing that happens if you set join_collapse_limit to 1 and use JOIN syntax to force a join order. In fact, IIRC we start out with nested joinlists and there is some code that normally flattens them until it decides it'd be creating too large a sub-problem. I'm suggesting selectively reversing the flattening. regards, tom lane
On Tue, Feb 4, 2025 at 3:42 PM Tomas Vondra <tomas@vondra.me> wrote:
On 2/4/25 21:23, Tom Lane wrote:
> Tomas Vondra <tomas@vondra.me> writes:
>> On 2/4/25 20:43, Jeff Davis wrote:
>>> If you base it on the join conditions rather than the size of the
>>> table, then detection of the star join would be based purely on the
>>> query structure (not stats), which would be nice for predictability.
>
>> Right, there may be other (possibly better) ways to detect the star join
>> shape. I was thinking about also requiring for foreign keys on the join
>> clauses - in DWH systems FKeys are sometimes omitted, which would break
>> the heuristics, but in OLTP it's common to still have them.
>
> I think you need to insist on foreign keys. Otherwise you don't know
> whether the joins will eliminate fact-table rows. If that's a
> possibility then it's no longer sensible to ignore different join
> orders.
Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.
Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough
to make this work for most cases, and then the rest would simply use the
regular join order algorithm.
I was thinking that if we allow the dimensions to eliminate rows in the
fact table, we'd simply join them starting from the most selective ones.
But that doesn't work if the joins might have different per-row costs
(e.g. because some dimensions are much larger etc). Doing something
smarter would likely end up fairly close to the regular join order
algorithm ...
As long as the join is still happening there doesn't appear to be a correctness issue here, so I'm not sure mandating FKs makes sense.
The reason this matters is that highly concurrent FK checks can get VERY expensive (due to the cost of creating multiXacts). While it'd be great to fix that issue the reality today is it's not uncommon for people to remove FKs because of the high performance penalty.
Jim Nasby <jnasby@upgrade.com> writes: > On Tue, Feb 4, 2025 at 3:42 PM Tomas Vondra <tomas@vondra.me> wrote: >> Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough >> to make this work for most cases, and then the rest would simply use the >> regular join order algorithm. > As long as the join is still happening there doesn't appear to be a > correctness issue here, so I'm not sure mandating FKs makes sense. > The reason this matters is that highly concurrent FK checks can get VERY > expensive (due to the cost of creating multiXacts). While it'd be great to > fix that issue the reality today is it's not uncommon for people to remove > FKs because of the high performance penalty. Meh. If we don't apply this optimization when there's no FK, we have not made those folks' life any worse. If we apply it despite there being no FK, we might choose a materially worse plan than before, and that *will* make their lives worse. regards, tom lane
On Wed, Feb 5, 2025 at 5:42 AM Tomas Vondra <tomas@vondra.me> wrote: > Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many > of these star join queries with LEFT JOIN too, and then the FKs are not > needed. All you need is a PK / unique index on the other side. > > Perhaps requiring (INNER JOIN + FK) or (LEFT JOIN + PK) would be enough > to make this work for most cases, and then the rest would simply use the > regular join order algorithm. > > I was thinking that if we allow the dimensions to eliminate rows in the > fact table, we'd simply join them starting from the most selective ones. > But that doesn't work if the joins might have different per-row costs > (e.g. because some dimensions are much larger etc). Doing something > smarter would likely end up fairly close to the regular join order > algorithm ... Yeah, we need to ensure that the joins to the fact table don't affect its rows; otherwise, the join order matters for the final query plan, and we'd better run the regular join search algorithm in this case. For inner joins, using the foreign key seems ideal for this. For left joins, we might be able to leverage rel_is_distinct_for() to handle that. Thanks Richard
On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Right now, if we have four tables to join, we have a joinlist > (A B C D). (Really they're integer relids, but let's use names here.) > If we decide to force C to be joined last, it should be sufficient to > convert this to ((A B D) C). If C and D both look like candidates for > this treatment, we can make it be (((A B) C) D) or (((A B) D) C). > This is pretty much the same thing that happens if you set > join_collapse_limit to 1 and use JOIN syntax to force a join order. > In fact, IIRC we start out with nested joinlists and there is some > code that normally flattens them until it decides it'd be creating > too large a sub-problem. I'm suggesting selectively reversing the > flattening. This should not be too difficult to implement. Outer joins seem to add some complexity, though. We need to ensure that the resulting joins in each sub-list are legal given the query's join order constraints. For example, if we make the joinlist be (((A B) C) D), we need to ensure that the A/B join and the (A/B)/C join are legal. Thanks Richard
On 2/5/25 09:27, Richard Guo wrote: > On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right now, if we have four tables to join, we have a joinlist >> (A B C D). (Really they're integer relids, but let's use names here.) >> If we decide to force C to be joined last, it should be sufficient to >> convert this to ((A B D) C). If C and D both look like candidates for >> this treatment, we can make it be (((A B) C) D) or (((A B) D) C). >> This is pretty much the same thing that happens if you set >> join_collapse_limit to 1 and use JOIN syntax to force a join order. >> In fact, IIRC we start out with nested joinlists and there is some >> code that normally flattens them until it decides it'd be creating >> too large a sub-problem. I'm suggesting selectively reversing the >> flattening. > > This should not be too difficult to implement. Outer joins seem to > add some complexity, though. We need to ensure that the resulting > joins in each sub-list are legal given the query's join order > constraints. For example, if we make the joinlist be (((A B) C) D), > we need to ensure that the A/B join and the (A/B)/C join are legal. > If the requirement is that all "dimensions" only join to the fact table (which in this example would be "A" I think) through a FK, then why would these joins be illegal? We'd also need to require either an outer (left) join, or "NOT NULL" on the fact table side, right? IIRC we already do that when using the FKeys for join estimates. Essentially, this defines a "dimension" as a relation that is joined through a PK, without any other restrictions, both of which seems fairly simple to check, and it's a "local" feature. And we'd simply mark those as "join at the very end, in arbitrary order". Easy enough, I guess. I'm thinking about some more complex cases: (a) Query with multiple starjoins (a special case of that is snowflake schema) - but I guess this is not too difficult, it just needs to consider the FKs as "transitive" (a bit like Dijkstra's algorithm). In the worst case we might need to "split" the whole query into multiple smaller subproblems. (b) Joining additional stuff to the dimensions (not through a FK, possibly to multiple dimensions, ...). Imagine a "diamond join" with some summary table, etc. IMHO this is a fairly rare case / expensive enough to make the planning part less important. I'm also wondering how this should interact with join_collapse_limit. It would seems ideal to do this optimization before splitting the join list into subproblems (so that the "dimensions" are do not even count against the limit, pretty much). But that would mean join_collapse_limit can no longer be used to enforce a join order like today ... regards -- Tomas Vondra
Hmmm, yeah. But that's only for the INNER JOIN case. But I've seen many
of these star join queries with LEFT JOIN too, and then the FKs are not
needed. All you need is a PK / unique index on the other side.
Indeed, many installations specifically _remove_ foreign keys because of the dreaded RI check on delete. Basically, if you delete one or more rows in dim1 then the referencing fact1 must be scanned to ensure that it does not contain a reference to the deleted row. Often the referencing field on fact1 is not indexed, because the index is almost never useful in an actual select query, so even if you did index it several unused index metrics will identify it as a candidate for deletion. What you get is one sequential scan of fact1 for every row deleted from dim1. Now, we could get around this by changing how we do delete RI checks, either by moving to statement level triggers or bypassing triggers entirely, but until we do so, it is likely that many customers avoid otherwise useful FK references.
On Wed, Feb 5, 2025 at 4:23 AM Tomas Vondra <tomas@vondra.me> wrote: > > If the requirement is that all "dimensions" only join to the fact table > (which in this example would be "A" I think) through a FK, then why > would these joins be illegal? > > ... > Essentially, this defines a "dimension" as a relation that is joined > through a PK, without any other restrictions, both of which seems fairly > simple to check, and it's a "local" feature. And we'd simply mark those > as "join at the very end, in arbitrary order". Easy enough, I guess. As I understand your proposal, you want to detect queries that join a large number, N, of tables -- which means doing an exhaustive search of all possible join orders is expensive -- where N - 1 of the tables do not join to each other, but join only to the Nth table. PostgreSQL already falls back on geqo when it hits some heuristic that says exhaustive search is too expensive, but you're proposing an additional, better heuristic. Say we have F JOIN D1 JOIN D2 ... JOIN D(N-1). In the example you gave, the single-table predicate on F makes it small enough, I think, that F will be the "build" side of any Hash Join, right? You're assuming, I think, that the cardinality |F| = 1, after applying the filter to F. And so, |F JOIN Dk| will be approximately 1, for any 1 <= k < N. So then the join order does not matter. I think this is what you mean by "OLTP star join." For *OLAP* star joins, Oracle's Star Transformation [1] works reasonably well, where Oracle scans D1, ..., D(N-1) first, constructs Bloom filters, etc., and then "pushes" the N-1 joins down into the Seq Scan on F. So, I suggest: 1. Add an *OLTP* optimization similar to what you described, but instead of classifying the largest table as fact F, look for the "hub" of the star and classify it as F. And then enable your optimization if and only if the estimated nrows for F is very small. 2. For an *OLAP* optimization, do something like Oracle's Star Transformation. Re "OLTP" vs. "OLAP": the join order does not matter for *OLTP* star queries, because the fact table F is *small* (post-filtering). And because F is small, it doesn't matter so much in what order you join the dimension tables, because the result is "likely" to be small as well. Tom correctly points out that you really need foreign key constraints to ensure the previous sentence's "likely," but since your optimization is just intended to avoid considering unnecessary join orders, you may be able to get away with asking the optimizer what it thinks the cardinality |(... (F JOIN D1) ... JOIN Dk)| would be, and just fall back on the existing join-search logic when the optimizer thinks that Dk will create lots of rows (and so the join order matters...). So much for the OLTP case. For completeness, some discussion about the OLAP case; fwiw, let me start by plugging my "credentials" [2]. The OLAP case is more complicated than the OLTP case, in that the bad thing about *OLAP* star joins is that joins are pairwise. With OLAP star joins, you assume that |F| is always much larger than |Dk|, and by extension |(... (F JOIN D1) ... JOIN Dk)| is generally larger than |D(k+1)|. And the problem for OLAP is that while every Dk potentially filters rows out from F, you have to join to the Dk's one at a time, so you never get as much filtering as you'd like. For OLAP, you can take the Cartesian product of D1, ..., DN , and then scan F to aggregate into the resulting cube; see [3] . (Link [2] is related to transformation.) Or, you can scan D1, ..., DN first, without joining anything, constructing Hash tables and Bloom filters from your scans; then push the Bloom filters down to the scan of F; and finally join the (Bloom-filtered) F back to D1, ..., DN. This is what link [1] describes. Note that [1] came out before [3]. However... for OLAP, you see from the above discussion that it's not compilation that takes too long, but rather execution. So the optimizations require significant changes to the SQL executor. What you're proposing, IIUC, is a nice optimization to compilation times, which is why (I think) you're focused on the OLTP use case. In that case, I suggest focusing on an OLTP-specific solution, maybe a straw man like: 1. I see a query where N-1 relations join to the Nth relation, but not to each other (except transitively, of course). 2. Estimated cardinality for F, after pushing down single table predicates, is very small. 3. OK, let's start joining tables D1, ..., D(N-1) in order, since we're assuming (thanks to (1) and (2)) that the join order won't matter. 4. Continue joining tables in this fixed (arbitrary) order, unless we come to a Dk where the optimizer thinks joining to Dk will generate a significant number of rows. 5. Either we join all tables in order (fast compilation!); or we hit the case in (4), so we just fall back on the existing join logic. Thanks, James [1] https://blogs.oracle.com/optimizer/post/optimizer-transformations-star-transformation [2] https://patents.google.com/patent/US20150088856A1/en [3] https://docs.oracle.com/en/database/oracle/oracle-database/12.2/inmem/optimizing-in-memory-aggregation.html
On 2/7/25 20:11, James Hunter wrote: > On Wed, Feb 5, 2025 at 4:23 AM Tomas Vondra <tomas@vondra.me> wrote: >> >> If the requirement is that all "dimensions" only join to the fact table >> (which in this example would be "A" I think) through a FK, then why >> would these joins be illegal? >> >> ... >> Essentially, this defines a "dimension" as a relation that is joined >> through a PK, without any other restrictions, both of which seems fairly >> simple to check, and it's a "local" feature. And we'd simply mark those >> as "join at the very end, in arbitrary order". Easy enough, I guess. > > As I understand your proposal, you want to detect queries that join a > large number, N, of tables -- which means doing an exhaustive search > of all possible join orders is expensive -- where N - 1 of the tables > do not join to each other, but join only to the Nth table. > Yes. Essentially, it reduces the size of the problem by ignoring joins for which we know the optimal order. We know the dimensions can be joined last, and it does not matter in which exact order we join them. The starjoins are a bit of the "worst case" for our heuristics, because there are no dependencies between the dimensions, and we end up exploring the n! possible join orders, more or less. For other joins we quickly prune the space. > PostgreSQL already falls back on geqo when it hits some heuristic that > says exhaustive search is too expensive, but you're proposing an > additional, better heuristic. True, but most people will never actually hit the GEQO, because the default threshold are set like this: join_collapse_limit = 8 geqo_threshold = 12 So the planner will not "create" join search problems with more than 8 relations, but geqo only kicks in at 12. Most systems run with the default values for these GUCs, so they don't really use GEQO. FWIW I don't know a lot about the GEQO internals, but I heard it doesn't work all that well for the join order problem anyway. Not sure. > Say we have F JOIN D1 JOIN D2 ... JOIN D(N-1). In the example you > gave, the single-table predicate on F makes it small enough, I think, > that F will be the "build" side of any Hash Join, right? You're > assuming, I think, that the cardinality |F| = 1, after applying the > filter to F. And so, |F JOIN Dk| will be approximately 1, for any 1 <= > k < N. So then the join order does not matter. I think this is what > you mean by "OLTP star join." > I don't think it matters very much on which side of the join the F will end up (or if it's a hash join, it can easily be NL). It will definitely be in the first join, though, because all other dimensions join to it (assuming this is just a starjoin, with only fact + dimensions). It also doesn't really matter what's the exact cardinality of |F|. The example used a PK lookup, so that would be 1 row, but the point is that this is (much) cheaper than the planning. E.g. the planning might take 3ms while the execution only takes 1ms, etc. In the OLAP cases this is usually not the case, because the queries are processing a lot of data from the fact table, and the planning is negligible. > For *OLAP* star joins, Oracle's Star Transformation [1] works > reasonably well, where Oracle scans D1, ..., D(N-1) first, constructs > Bloom filters, etc., and then "pushes" the N-1 joins down into the Seq > Scan on F. > I don't care about OLAP star joins, at least no in this patch. It's a completely different / separate use case, and it affects very different parts of the planner (and also the executor, which this patch does not need to touch at all). > So, I suggest: > 1. Add an *OLTP* optimization similar to what you described, but > instead of classifying the largest table as fact F, look for the "hub" > of the star and classify it as F. And then enable your optimization if > and only if the estimated nrows for F is very small. > Right. I believe this is mostly what looking for FKs (as suggested by Tom) would end up doing. It doesn't need to consider the cardinality of F at all. > 2. For an *OLAP* optimization, do something like Oracle's Star > Transformation. > I consider that well outside the scope of this patch. > Re "OLTP" vs. "OLAP": the join order does not matter for *OLTP* star > queries, because the fact table F is *small* (post-filtering). And > because F is small, it doesn't matter so much in what order you join > the dimension tables, because the result is "likely" to be small as > well. > I don't think that's quite true. The order of dimension joins does not matter because the joins do not affect the join size at all. The size of |F| has nothing to do with that, I think. We'll do the same number of lookups against the dimensions no matter in what order we join them. And we know it's best to join them as late as possible, after all the joins that reduce the size (and before joins that "add" rows, I think). > Tom correctly points out that you really need foreign key constraints > to ensure the previous sentence's "likely," but since your > optimization is just intended to avoid considering unnecessary join > orders, you may be able to get away with asking the optimizer what it > thinks the cardinality |(... (F JOIN D1) ... JOIN Dk)| would be, and > just fall back on the existing join-search logic when the optimizer > thinks that Dk will create lots of rows (and so the join order > matters...). > Possibly, but TBH the join cardinality estimates can be quite dubious pretty easily. The FK is a much more reliable (definitive) information. > So much for the OLTP case. For completeness, some discussion about the > OLAP case; fwiw, let me start by plugging my "credentials" [2]. > Thanks ;-) > The OLAP case is more complicated than the OLTP case, in that the bad > thing about *OLAP* star joins is that joins are pairwise. With OLAP > star joins, you assume that |F| is always much larger than |Dk|, and > by extension |(... (F JOIN D1) ... JOIN Dk)| is generally larger than > |D(k+1)|. And the problem for OLAP is that while every Dk potentially > filters rows out from F, you have to join to the Dk's one at a time, > so you never get as much filtering as you'd like. > > For OLAP, you can take the Cartesian product of D1, ..., DN , and then > scan F to aggregate into the resulting cube; see [3] . (Link [2] is > related to transformation.) > > Or, you can scan D1, ..., DN first, without joining anything, > constructing Hash tables and Bloom filters from your scans; then push > the Bloom filters down to the scan of F; and finally join the > (Bloom-filtered) F back to D1, ..., DN. This is what link [1] > describes. Note that [1] came out before [3]. > > However... for OLAP, you see from the above discussion that it's not > compilation that takes too long, but rather execution. So the > optimizations require significant changes to the SQL executor. > Agreed. I'm not against improving the OLAP case too, but it's not what this thread was about. It seems it'll need changes in very different places, etc. > What you're proposing, IIUC, is a nice optimization to compilation > times, which is why (I think) you're focused on the OLTP use case. In > that case, I suggest focusing on an OLTP-specific solution, maybe a > straw man like: > 1. I see a query where N-1 relations join to the Nth relation, but not > to each other (except transitively, of course). > 2. Estimated cardinality for F, after pushing down single table > predicates, is very small. > 3. OK, let's start joining tables D1, ..., D(N-1) in order, since > we're assuming (thanks to (1) and (2)) that the join order won't > matter. > 4. Continue joining tables in this fixed (arbitrary) order, unless we > come to a Dk where the optimizer thinks joining to Dk will generate a > significant number of rows. > 5. Either we join all tables in order (fast compilation!); or we hit > the case in (4), so we just fall back on the existing join logic. > Yes, I think that's pretty much the idea. Except that I don't think we need to look at the |F| at all - it will have more impact for small |F|, of course, but it doesn't hurt for large |F|. I think it'll probably need to consider which joins increase/decrease the cardinality, and "inject" the dimension joins in between those. regards -- Tomas Vondra
On Fri, Feb 7, 2025 at 12:09 PM Tomas Vondra <tomas@vondra.me> wrote: > ... > Yes, I think that's pretty much the idea. Except that I don't think we > need to look at the |F| at all - it will have more impact for small |F|, > of course, but it doesn't hurt for large |F|. > > I think it'll probably need to consider which joins increase/decrease > the cardinality, and "inject" the dimension joins in between those. YMMV, but I suspect you may find it much easier to look at |F|, |F JOIN D1|, |(F JOIN D1) JOIN D2|, etc., than to consider |F JOIN D1| / |F|, etc. (In other words, I suspect that considering absolute cardinalities will end up easier/cleaner than considering ratios of increases/decreases in cardinalities.) But I have not thought about this much, so I am not putting too much weight on my suspicions. James
On 2/7/25 23:43, James Hunter wrote: > On Fri, Feb 7, 2025 at 12:09 PM Tomas Vondra <tomas@vondra.me> wrote: >> ... >> Yes, I think that's pretty much the idea. Except that I don't think we >> need to look at the |F| at all - it will have more impact for small |F|, >> of course, but it doesn't hurt for large |F|. >> >> I think it'll probably need to consider which joins increase/decrease >> the cardinality, and "inject" the dimension joins in between those. > > YMMV, but I suspect you may find it much easier to look at |F|, |F > JOIN D1|, |(F JOIN D1) JOIN D2|, etc., than to consider |F JOIN D1| / > |F|, etc. (In other words, I suspect that considering absolute > cardinalities will end up easier/cleaner than considering ratios of > increases/decreases in cardinalities.) But I have not thought about > this much, so I am not putting too much weight on my suspicions. > That's not what I meant when I mentioned joins that increase/decrease cardinality. That wasn't referring to the "dimension" joins, which we expect to have FK and thus should not affect the cardinality at all. Instead, I was thinking about the "other" joins (if there are any), that may add or remove rows. AFAIK we want to join the dimensions at the place with the lowest cardinality - the discussion mostly assumed the joins would only reduce the cardinality, in which case we'd just leave the dimensions until the very end. But ISTM that may not be necessarily true. Let's say there's a join that "multiplies" each row. It'll probably be done at the end, and the dimension joins should probably happen right before it ... not sure. cheers -- Tomas Vondra
Tomas Vondra <tomas@vondra.me> writes: > Instead, I was thinking about the "other" joins (if there are any), that > may add or remove rows. AFAIK we want to join the dimensions at the > place with the lowest cardinality - the discussion mostly assumed the > joins would only reduce the cardinality, in which case we'd just leave > the dimensions until the very end. > But ISTM that may not be necessarily true. Let's say there's a join that > "multiplies" each row. It'll probably be done at the end, and the > dimension joins should probably happen right before it ... not sure. I thought the idea here was to get rid of as much join order searching as we could. Insisting that we get the best possible plan anyway seems counterproductive, not to mention very messy to implement. So I'd just push all these joins to the end and be done with it. regards, tom lane
On 2/8/25 01:23, Tom Lane wrote: > Tomas Vondra <tomas@vondra.me> writes: >> Instead, I was thinking about the "other" joins (if there are any), that >> may add or remove rows. AFAIK we want to join the dimensions at the >> place with the lowest cardinality - the discussion mostly assumed the >> joins would only reduce the cardinality, in which case we'd just leave >> the dimensions until the very end. > >> But ISTM that may not be necessarily true. Let's say there's a join that >> "multiplies" each row. It'll probably be done at the end, and the >> dimension joins should probably happen right before it ... not sure. > > I thought the idea here was to get rid of as much join order searching > as we could. Insisting that we get the best possible plan anyway > seems counterproductive, not to mention very messy to implement. > So I'd just push all these joins to the end and be done with it. > Right, cutting down on the join order searching is the point. But most of the savings comes (I think) from not considering different ordering of the dimensions, because those are all essentially the same. Consider a join with 16 relations, 10 of which are dimensions. There are 10! possible orders of the dimensions, but all of them behave pretty much exactly the same. In a way, this behaves almost like a join with 7 relations, one of which represents all the 10 dimensions. I think this "join group" abstraction (a relation representing a bunch of relations in a particular order) would make this reasonably clean to implement. I haven't tried yet, though. Yes, this means we'd explore more orderings, compared to just pushing all the dimensions to the end. In the example above, that'd be 7!/6!, so up to ~7x orderings. I don't know if this is worth the extra complexity, of course. thanks -- Tomas Vondra
On 2/5/25 09:27, Richard Guo wrote: > On Wed, Feb 5, 2025 at 5:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right now, if we have four tables to join, we have a joinlist >> (A B C D). (Really they're integer relids, but let's use names here.) >> If we decide to force C to be joined last, it should be sufficient to >> convert this to ((A B D) C). If C and D both look like candidates for >> this treatment, we can make it be (((A B) C) D) or (((A B) D) C). >> This is pretty much the same thing that happens if you set >> join_collapse_limit to 1 and use JOIN syntax to force a join order. >> In fact, IIRC we start out with nested joinlists and there is some >> code that normally flattens them until it decides it'd be creating >> too large a sub-problem. I'm suggesting selectively reversing the >> flattening. > > This should not be too difficult to implement. Outer joins seem to > add some complexity, though. We need to ensure that the resulting > joins in each sub-list are legal given the query's join order > constraints. For example, if we make the joinlist be (((A B) C) D), > we need to ensure that the A/B join and the (A/B)/C join are legal. > I've not done anything like this with joins before, but I AFAICS the interesting stuff happens in deconstruct_recurse(), especially close to the end where we check join_collapse_limit and do joinlist = list_make2(leftpart, rightpart); So I guess one way to "reverse the flattening" could be to do something in deconstruct_recourse(). But I don't think that'd work all that well, because of the recursion. We don't want to add a "pipeline break" into the join list, we want to move the "dimension" to the end - even if only within the group defined by join_collapse_limit. E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1 and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say deconstruct_recurse() sees them in this particular order [F, T1, T2, D1, D2, T3, T4, T5] AFAICS doing something in deconstruct_recurse() would likely split the joinlist into four parts [F, T1, T2] [D1] [D2] [T3, T4, T5] which does treat the D1,D2 as if join_collapse_limit=1, but it also splits the remaining relations into two groups, when we'd probably want something more like this: [F, T1, T2, T3, T4, T5] [D1] [D2] Which should be legal, because a requirement is that D1/D2 don't have any other join restrictions (I guess this could be relaxed a bit to only restrictions within that particular group). Which leads me to the conclusion that the best place to do this kind of stuff is deconstruct_jointree(), once we have the "complete" joinlist. We could walk it and reorder/split some of the joinlists again. regards -- Tomas Vondra
On Mon, Feb 10, 2025 at 9:32 AM Tomas Vondra <tomas@vondra.me> wrote: > E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1 > and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say > deconstruct_recurse() sees them in this particular order > > [F, T1, T2, D1, D2, T3, T4, T5] > > AFAICS doing something in deconstruct_recurse() would likely split the > joinlist into four parts > > [F, T1, T2] [D1] [D2] [T3, T4, T5] > > which does treat the D1,D2 as if join_collapse_limit=1, but it also > splits the remaining relations into two groups, when we'd probably want > something more like this: > > [F, T1, T2, T3, T4, T5] [D1] [D2] > > Which should be legal, because a requirement is that D1/D2 don't have > any other join restrictions (I guess this could be relaxed a bit to only > restrictions within that particular group). Hmm, I'm still a little concerned about whether the resulting joins are legal. Suppose we have a join pattern like the one below. F left join (D1 inner join T on true) on F.b = D1.b left join D2 on F.c = D2.c; For this query, the original joinlist is [F, D1, T, D2]. If we reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to produce any joins, as the F/T join is not legal. This may not be the pattern we are targeting. But if we intend to support it, I think we need a way to ensure that the resulting joins are legal. Thanks Richard
On 2/10/25 08:29, Richard Guo wrote: > On Mon, Feb 10, 2025 at 9:32 AM Tomas Vondra <tomas@vondra.me> wrote: >> E.g. imagine we have a join of 8 relations, with F (fact), dimensions D1 >> and D2, and then some artibrary tables T1, T2, T3, T4, T5. And let's say >> deconstruct_recurse() sees them in this particular order >> >> [F, T1, T2, D1, D2, T3, T4, T5] >> >> AFAICS doing something in deconstruct_recurse() would likely split the >> joinlist into four parts >> >> [F, T1, T2] [D1] [D2] [T3, T4, T5] >> >> which does treat the D1,D2 as if join_collapse_limit=1, but it also >> splits the remaining relations into two groups, when we'd probably want >> something more like this: >> >> [F, T1, T2, T3, T4, T5] [D1] [D2] >> >> Which should be legal, because a requirement is that D1/D2 don't have >> any other join restrictions (I guess this could be relaxed a bit to only >> restrictions within that particular group). > > Hmm, I'm still a little concerned about whether the resulting joins > are legal. Suppose we have a join pattern like the one below. > > F left join > (D1 inner join T on true) on F.b = D1.b > left join D2 on F.c = D2.c; > > For this query, the original joinlist is [F, D1, T, D2]. If we > reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to > produce any joins, as the F/T join is not legal. > > This may not be the pattern we are targeting. But if we intend to > support it, I think we need a way to ensure that the resulting joins > are legal. > It's quite possible the PoC patch I posted fails to ensure this, but I think the assumption is we'd not reorder joins for dimensions that any any join order restrictions (except for the FK join). regards -- Tomas Vondra
On Fri, Feb 7, 2025 at 3:09 PM Tomas Vondra <tomas@vondra.me> wrote: > I don't think that's quite true. The order of dimension joins does not > matter because the joins do not affect the join size at all. The size of > |F| has nothing to do with that, I think. We'll do the same number of > lookups against the dimensions no matter in what order we join them. And > we know it's best to join them as late as possible, after all the joins > that reduce the size (and before joins that "add" rows, I think). This is often not quite true, because there are often restriction clauses on the fact tables that result in some rows being eliminated. e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia'; However, I think that trying to somehow leverage the existence of either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases, many of the joins don't change the row count, so we don't really need to explore all possible orderings of those joins. We might be able to define some concept of "join that does't change the row count at all" or maybe better "join that doesn't change the row count very much". And then if we have a lot of such joins, we can consider them as a group. Say we have 2 joins that do change the row count significantly, and then 10 more than don't. The 10 that don't can be done before, between, or after the two that do, but it doesn't seem necessary to consider doing some of them at one point and some at another. Maybe that's not the right way to think about this problem; I haven't read the academic literature on star-join optimization. But it has always felt stupid to me that we spend a bunch of time considering join orders that are not meaningfully different, and I think what makes two join orders not meaningfully different is that we're commuting joins that are not changing the row count. (Also worth noting: even joins of this general form change the row count, they can only reduce it.) -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Feb 10, 2025 at 5:35 PM Tomas Vondra <tomas@vondra.me> wrote: > On 2/10/25 08:29, Richard Guo wrote: > > Hmm, I'm still a little concerned about whether the resulting joins > > are legal. Suppose we have a join pattern like the one below. > > > > F left join > > (D1 inner join T on true) on F.b = D1.b > > left join D2 on F.c = D2.c; > > > > For this query, the original joinlist is [F, D1, T, D2]. If we > > reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to > > produce any joins, as the F/T join is not legal. > > > > This may not be the pattern we are targeting. But if we intend to > > support it, I think we need a way to ensure that the resulting joins > > are legal. > It's quite possible the PoC patch I posted fails to ensure this, but I > think the assumption is we'd not reorder joins for dimensions that any > any join order restrictions (except for the FK join). Then, we'll need a way to determine if a given relation has join-order restrictions, which doesn't seem like a trivial task. We do have the has_join_restriction() function, but it considers any relations involved in an outer join as having join restrictions, and that makes it unsuitable for our needs here. Thanks Richard
On 2/10/25 22:36, Robert Haas wrote: > On Fri, Feb 7, 2025 at 3:09 PM Tomas Vondra <tomas@vondra.me> wrote: >> I don't think that's quite true. The order of dimension joins does not >> matter because the joins do not affect the join size at all. The size of >> |F| has nothing to do with that, I think. We'll do the same number of >> lookups against the dimensions no matter in what order we join them. And >> we know it's best to join them as late as possible, after all the joins >> that reduce the size (and before joins that "add" rows, I think). > > This is often not quite true, because there are often restriction > clauses on the fact tables that result in some rows being eliminated. > > e.g. SELECT * FROM hackers h JOIN languages l ON h.language_id = l.id > JOIN countries c ON h.country_id = c.id WHERE c.name = 'Czechia'; > True. I think this was discussed earlier in this thread - dimensions with additional restrictions may affect the row count, and thus would be exempt from this (and would instead go through the "regular" join order search algorithm). So I assumed the "dimensions" don't have any such restrictions in my message, I should have mentioned that. > However, I think that trying to somehow leverage the existence of > either FK or LJ+UNIQUE is still a pretty good idea. In a lot of cases, > many of the joins don't change the row count, so we don't really need > to explore all possible orderings of those joins. We might be able to > define some concept of "join that does't change the row count at all" > or maybe better "join that doesn't change the row count very much". > And then if we have a lot of such joins, we can consider them as a > group. Say we have 2 joins that do change the row count significantly, > and then 10 more than don't. The 10 that don't can be done before, > between, or after the two that do, but it doesn't seem necessary to > consider doing some of them at one point and some at another. > > Maybe that's not the right way to think about this problem; I haven't > read the academic literature on star-join optimization. But it has > always felt stupid to me that we spend a bunch of time considering > join orders that are not meaningfully different, and I think what > makes two join orders not meaningfully different is that we're > commuting joins that are not changing the row count. > > (Also worth noting: even joins of this general form change the row > count, they can only reduce it.) > I searched for papers on star-joins, but pretty much everything I found focuses on the OLAP case. Which is interesting, I think the OLTP star-join I described seems quite different, and I'm not sure the trade offs are necessarily the same. My gut feeling is that in the first "phase" we should focus on the case with no restrictions - that makes it obvious what the optimal order is, and it will help a significant fraction of cases. And then in the next step we can try doing something for cases with restrictions - be it some sort of greedy algorithm, something that leverages knowledge of the selectivity to prune join orders early (instead of actually exploring all N! join orders like today). Or something else, not sure. regards -- Tomas Vondra
On 2/11/25 10:28, Richard Guo wrote: > On Mon, Feb 10, 2025 at 5:35 PM Tomas Vondra <tomas@vondra.me> wrote: >> On 2/10/25 08:29, Richard Guo wrote: >>> Hmm, I'm still a little concerned about whether the resulting joins >>> are legal. Suppose we have a join pattern like the one below. >>> >>> F left join >>> (D1 inner join T on true) on F.b = D1.b >>> left join D2 on F.c = D2.c; >>> >>> For this query, the original joinlist is [F, D1, T, D2]. If we >>> reorder it to [[F, T], D1, D2], the sub-joinlist [F, T] would fail to >>> produce any joins, as the F/T join is not legal. >>> >>> This may not be the pattern we are targeting. But if we intend to >>> support it, I think we need a way to ensure that the resulting joins >>> are legal. > >> It's quite possible the PoC patch I posted fails to ensure this, but I >> think the assumption is we'd not reorder joins for dimensions that any >> any join order restrictions (except for the FK join). > > Then, we'll need a way to determine if a given relation has join-order > restrictions, which doesn't seem like a trivial task. We do have the > has_join_restriction() function, but it considers any relations > involved in an outer join as having join restrictions, and that makes > it unsuitable for our needs here. > I admit knowing next to nothing about join order planning :-( Could you maybe explain why it would be non-trivial to determine if a relation has join-order restrictions? Surely we already determine that, no? So what would we need to do differently? Or are you saying that because has_join_restriction() treats each relation with an outer join as having a restriction, that makes it unusable for the purpose of this optimization/patch? And we'd need to invent something more elaborate? I'm not sure that's quite true. The problem with joining the dimensions (with inner joins) is *exactly* the lack of restrictions, which means that explore possible orders of those dimensions (all N! of them). With the restrictions (e.g. from LEFT JOIN), that's no longer true - in a way, this is similar to what the patch does. And in fact, replacing the inner joins with LEFT JOINs makes the queries much faster. I've seen this used as a workaround to cut down on planning time ... So I don't think treating outer joins as "having restriction" is a problem. It doesn't regress any queries, although it might lead to a bit strange situation that "less restricted" joins are faster to plan. regards -- Tomas Vondra
On 2/4/25 22:55, Tom Lane wrote: > Tomas Vondra <tomas@vondra.me> writes: >>> The interesting thing about this is we pretty much have all the >>> infrastructure for detecting such FK-related join conditions >>> already. Possibly the join order forcing could be done with >>> existing infrastructure too (by manipulating the joinlist). > >> Maybe, interesting. I've ruled out relying on the FKeys early in the >> coding, but I'm sure there's infrastructure the patch could use. > > It would be very sad to do that work twice in a patch that purports > to reduce planning time. If it's done too late to suit you now, > could we move it to happen earlier? > >> What kind of "manipulation" of the joinlist you have in mind? > > Right now, if we have four tables to join, we have a joinlist > (A B C D). (Really they're integer relids, but let's use names here.) > If we decide to force C to be joined last, it should be sufficient to > convert this to ((A B D) C). If C and D both look like candidates for > this treatment, we can make it be (((A B) C) D) or (((A B) D) C). > This is pretty much the same thing that happens if you set > join_collapse_limit to 1 and use JOIN syntax to force a join order. > In fact, IIRC we start out with nested joinlists and there is some > code that normally flattens them until it decides it'd be creating > too large a sub-problem. I'm suggesting selectively reversing the > flattening. > > regards, tom lane Here's a patch trying to do it more like this - by manipulating the lists describing the join problems, before it's passed the the actual join search algorithm (which is where the PoC patch did this). I wonder if this is roughly the place where you imagined this would be done, or if you envision some other issue with this approach. The patch is definitely incomplete, there's plenty of XXX comments about places missing some code, etc. I initially tried to manipulate the joinlist much earlier - pretty much right at the end of deconstruct_jointree. But that turned out to be way too early. To identify dimensions etc. we need to check stuff about foreign keys, join clauses, ... and that's not available that early. So I think this needs to happen much later in query_planner(), and the patch does it right before the make_one_rel() call. Maybe that's too late, but it needs to happen after match_foreign_keys_to_quals(), as it relies on some of the FK info built by that call. Maybe we could call match_foreign_keys_to_quals() earlier, but I don't quite see any benefits of doing that ... On 2/8/25 02:49, Tomas Vondra wrote: > On 2/8/25 01:23, Tom Lane wrote: >> Tomas Vondra <tomas@vondra.me> writes: >>> Instead, I was thinking about the "other" joins (if there are any), that >>> may add or remove rows. AFAIK we want to join the dimensions at the >>> place with the lowest cardinality - the discussion mostly assumed the >>> joins would only reduce the cardinality, in which case we'd just leave >>> the dimensions until the very end. >> >>> But ISTM that may not be necessarily true. Let's say there's a join that >>> "multiplies" each row. It'll probably be done at the end, and the >>> dimension joins should probably happen right before it ... not sure. >> >> I thought the idea here was to get rid of as much join order searching >> as we could. Insisting that we get the best possible plan anyway >> seems counterproductive, not to mention very messy to implement. >> So I'd just push all these joins to the end and be done with it. >> > > Right, cutting down on the join order searching is the point. But most > of the savings comes (I think) from not considering different ordering > of the dimensions, because those are all essentially the same. > > Consider a join with 16 relations, 10 of which are dimensions. There are > 10! possible orders of the dimensions, but all of them behave pretty > much exactly the same. In a way, this behaves almost like a join with 7 > relations, one of which represents all the 10 dimensions. > > I think this "join group" abstraction (a relation representing a bunch > of relations in a particular order) would make this reasonably clean to > implement. I haven't tried yet, though. > > Yes, this means we'd explore more orderings, compared to just pushing > all the dimensions to the end. In the example above, that'd be 7!/6!, so > up to ~7x orderings. > > I don't know if this is worth the extra complexity, of course. > I'm still concerned about regressions when happen to postpone some expensive dimension joins after a join that increases the cardinality. And the "join group" would address that. But It probably is not a very common query pattern, and it'd require changes to join_search_one_level. I'm not sure what could / should count as 'dimension'. The patch doesn't implement this yet, but I think it can mostly copy how we match the FK to the join in get_foreign_key_join_selectivity. There's probably more to think about, though. For example, don't we need to check NOT NULL / nullfrac on the referencing side? Also, it probably interacts with LEFT/OUTER joins. I plan to start strict and then relax and handle some additional cases. I'm however struggling with the concept of join order restrictions a bit. I suspect we need to worry about that when walking the relation list and figuring out what can be a dimension, but I've never worked with this, so my mental model of how this works is not great. Another question is if this planning shortcut (which for the dimensions mostly picks a random join order) could have some unexpected impact on the rest of the planning. For example, could we "miss" some join producing tuples in an interesting order? Or could we fail to consider a partition-wise join? Could this "shortcut" restrict the rest of the plan in some undesirable way? For example, if some of the tables are partitioned, could this mean we no longer can do partition-wise joins with the (mostly arbitrary) join order we picked? There's also a "patch" directory, with some SQL scripts creating two simple examples of schemas/queries that would benefit from this. The "create-1/select-1" examples are the simple starjoins, this thread focuses on. It might be modified to do "snowflake" join, which is fundamentally a variant of this query type. The second example (create-2/select-2) is quite different, in that it's nor a starjoin schema. It still joins one "main" table with multiple "dimensions", but the FKs go in the other direction (to a single column in the main table). But it has a very similar bottleneck - the order of the joins is expensive, but it often does not matter very much, because the query matches just one row anyway. And even if it returns more rows, isn't the join order determined just by the selectivity of each join? Maybe the starjoin optimization could be made to work for this type too? regards -- Tomas Vondra
Вложения
[ sorry for ridiculously slow response to this ] Tomas Vondra <tomas@vondra.me> writes: > Here's a patch trying to do it more like this - by manipulating the > lists describing the join problems, before it's passed the the actual > join search algorithm (which is where the PoC patch did this). > I wonder if this is roughly the place where you imagined this would be > done, or if you envision some other issue with this approach. Cool. This is proof-of-concept that manipulating the joinlist can do what we need done. So we can move on to what heuristics we need to use. > I initially tried to manipulate the joinlist much earlier - pretty much > right at the end of deconstruct_jointree. But that turned out to be way > too early. To identify dimensions etc. we need to check stuff about > foreign keys, join clauses, ... and that's not available that early. > So I think this needs to happen much later in query_planner(), and the > patch does it right before the make_one_rel() call. Maybe that's too > late, but it needs to happen after match_foreign_keys_to_quals(), as it > relies on some of the FK info built by that call. Maybe we could call > match_foreign_keys_to_quals() earlier, but I don't quite see any > benefits of doing that ... I don't have a problem with doing it where you did it, but the comment should explain the placement. What you do have in the comment mostly belongs with the code, too; it's not the business of the caller. So in planmain.c something like + /* + * Try to simplify the join search problem for starjoin-like joins. + * This step relies on info about FK relationships, so we can't do it + * till after match_foreign_keys_to_quals(). + */ would be more appropriate IMO. I'd be slightly inclined to put the GUC test there, too: + if (enable_starjoin_join_search) + joinlist = starjoin_adjust_joins(root, joinlist); I agree that you need to worry about join order restrictions, and that it's not immediately clear how to do that. join_is_legal would work if we could call it, but the problem is that at this stage we'll only have RelOptInfos for base rels not join rels. If we have a joinlist (A B C D) and we are considering treating C as a dimension table, then the questions we have to ask are: (a) is it okay to build the (A B D) join without C? (b) is it okay to join (A B D) to C? In this simple case, I think (b) must be true if (a) is, but I'm not quite sure that that's so in more complex cases with multiple candidates for dimension tables. In any case, join_is_legal won't help us if we don't have join RelOptInfos. I'm inclined to start by using has_join_restriction: if that says "false" for a candidate dimension table, it should be safe to postpone the join to the dimension table. We might be able to refine that later. > The second example (create-2/select-2) is quite different, in that it's > nor a starjoin schema. It still joins one "main" table with multiple > "dimensions", but the FKs go in the other direction (to a single column > in the main table). But it has a very similar bottleneck - the order of > the joins is expensive, but it often does not matter very much, because > the query matches just one row anyway. And even if it returns more rows, > isn't the join order determined just by the selectivity of each join? > Maybe the starjoin optimization could be made to work for this type too? Yeah, I'm slightly itchy about relying on FKs in this heuristic at all; it doesn't seem like quite the right thing. I think we do want one side of the join to be joining to a PK or at least a unique index, but I'm not sure we need to insist on there being an FK relationship. A couple of minor coding notes: * There's no point in doing anything (except recursing) if the joinlist contains fewer than 3 items, and maybe as a further heuristic this shouldn't kick in till later yet, like 5 or so items. When there are just a few items, the possibility of missing the best plan seems to outweigh the minimal savings in plan time. * Joinlists never contain anything but RangeTblRefs and sub-lists. See make_rel_from_joinlist. * Your reconstructed joinlist is overly complicated. Instead of + newlist = list_make2(newlist, list_make1(lfirst(lc))); you could just do + newlist = list_make2(newlist, lfirst(lc)); because a single-element subproblem is useless. I notice that the patch doesn't apply cleanly anymore because of the introduction of guc_parameters.dat. So here's a v3 that rebases over that, and I took the liberty of fixing the joinlist construction as above, but I didn't do anything else. regards, tom lane diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 2a3dea88a94..5417c10fbf8 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -51,6 +51,7 @@ typedef struct } SelfJoinCandidate; bool enable_self_join_elimination; +bool enable_starjoin_join_search; /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); @@ -2514,3 +2515,427 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist) return joinlist; } + +/* + * starjoin_match_to_foreign_key + * Try to match a join to a FK constraint. + * + * For a relation to be a dimension (for the starjoin heuristics), it needs + * to be joined through a FK constraint. The dimension is expected to be + * on the PK side of the join. The relation must not have any additional + * join clauses, beyond those matching the foreign key. + * + * We already have a list of relevant foreign keys, and we use that info + * for selectivity estimation in get_foreign_key_join_selectivity(). And + * we're actually doing something quite similar here. + * + * XXX Do we need to worry about the join type, e.g. inner/outer joins, + * semi/anti? get_foreign_key_join_selectivity() does care about it, and + * ignores some of those cases. Maybe we should too? + * + * XXX Check there are no other join clauses, beyond those matching the + * foreign key. But do we already have the joininfo at this point? Some + * of this stuff gets build only during the join order search later. + * The match_foreign_keys_to_quals() probably needs to be aware of all + * this, so how does it do that? + */ +static bool +starjoin_match_to_foreign_key(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + + /* Consider each FK constraint that is known to match the query */ + foreach(lc, root->fkey_list) + { + ForeignKeyOptInfo *fkinfo = (ForeignKeyOptInfo *) lfirst(lc); + int nmatches = 0; + + /* rel is not the referenced table of the FK */ + if (fkinfo->ref_relid != rel->relid) + continue; + + /* + * Do we have a match for each key of the FK? + * + * XXX get_foreign_key_join_selectivity checks EquivalenceClasses, + * we should probably (definitely) do that here too. + * + * XXX We should check that all the clauses have the same relation + * on the other side (for multi-column keys). And that there are + * no other join clauses other than those matching the FK. + * + * XXX Do we need to check that the FK side of the join (i.e. the fact + * table) has the columns referenced as NOT NULL? Otherwise we could + * have a FK join that reduces the cardinality, which is one of + * the arguments why it's fine to move the join (that it doesn't + * change the cardinality). But if the join is LEFT JOIN, this + * should be fine too - but do we get here with LEFT JOINs? + * + * XXX Do we need to check if the other side of the FK is in the + * current join list? Maybe it's in some later one? + */ + for (int i = 0; i < fkinfo->nkeys; i++) + { + bool has_matching_clause = false; + + /* + * Is there a clause matching this FK key? + * + * XXX We need to make sure it's a valid match, e.g. that the + * same referencing table matches all keys in a composite FK, + * and so on. + * + * XXX Do we need to check some relationship to other rels in + * the same jointree item? E.g. the referencing table should + * not be a dimensions we already removed. + */ + if ((fkinfo->rinfos[i] != NULL) || (fkinfo->eclass[i] != NULL)) + { + has_matching_clause = true; + nmatches++; + continue; + } + + /* found a FK key without a matching join clause, ignore the FK */ + if (has_matching_clause) + break; + } + + /* matched all FK keys */ + if (nmatches == fkinfo->nkeys) + { + return true; + } + } + + return false; +} + + +/* + * starjoin_is_dimension + * Determine if a range table entry is a dimension in a starjoin. + * + * To be considered a dimension for the simplified join order search, the + * join must not affect the cardinality of the join. We ensure that by + * requiring a couple things: + * + * 1) the join clause has to match a FK (that is, the fact does have at + * most one matching row in the dimension) + * + * 2) the FK side (the fact table) should be marked as NOT NULL, so that + * we know there's exactly one dimension row for each fact table row + * + * 3) there must be no additional restrictions on the relation (that + * might eliminate some of the rows, reducing the cardinality) + * + * XXX The Implementation is incomplete. It probably needs more thought, + * considering some join types would allow relaxing some of the checks + * (e.g. outer joins may not require checking (2) or possibly even (3), + * depending on where the condition is, what varnullingrels it has). + * + * XXX I wonder if we could handle (3) by ordering the dimensions by the + * selectivity of the restriction. There are no join clauses between the + * dimensions (ignoring the snowflake joins, but even there the clauses + * don't go between branches), so the selectivity could be treated as + * a measure of how much it shrinks the join result. So we could just + * sort the dimensions by this, starting with the lowest selectivity + * (close to 0.0), and ending with dimensions without restrictions (in + * which case the selectivity is 1.0). + * + * XXX If the join in INNER, and the fact side has NULL values in the + * join key, we might consider nullfrac as restriction. + * + * XXX I'm not sure how careful this needs to be about join order + * restrictions. Maybe it should call have_relevant_joinclause and + * have_join_order_restriction, to ensure the join order is OK? + * + * The optimizer/README is not very clear about this, but maybe it's + * a too specific question. It seems to say the relations in those + * lists can be joined in any order (lines 94 and 106). Maybe that's + * not what it means, or I'm misunderstanding it. + * + * It however seems has_join_restrictions() in join_search_one_level() + * forces the code to look only at "earlier" rels in the list + * + * first_rel = foreach_current_index(r) + 1 + * + * So maybe we just need to stop once we find a rel with a restriction, + * as determined byhas_join_restrictions()? + * + * But there's also join_is_legal() to check legality of joins, with + * LEFT/RIGHT joins, and IN/EXISTS clauses. See README line 188. And it + * also looks-up the SpecialJoinInfo for the join. So maybe we should + * lookup RelOptInfo for both sides of the join, and call join_is_legal + * on that? Might be too expensive, though. Maybe do that only when + * has_join_restrictions already says yes? + * + * Maybe we should use has_join_restrictions(), but in a different way. + * We could still treat rels with restrictions as dimensions, and move + * that to the separate list (that doesn't change the join order), but + * stop once we hit the first non-dimension with a restriction? Because + * if any relation after that was a dimention, we wouldn't be able to + * move it to the separate list. It'd change the join order in a way + * that might violate the restriction. I believe that's the idea behind + * first_rel in join_search_one_level(), but maybe not. + * + * Perhaps have_join_order_restriction and have_relevant_joinclause are + * useful for this, rather than has_join_restrictions? We might look at + * actual pairs of relations, and/or check there's no join order + * restriction with respect to the relations we skipped/moved to the + * list of dimension? + * + * AFAICS it's just the skipping that can break the order restrictions? + * Adding something to the list of dimensions keeps the order (at least + * with respect to the rels after it). + */ +static bool +starjoin_is_dimension(PlannerInfo *root, RangeTblRef *rtr) +{ + Index rti = rtr->rtindex; + RangeTblEntry *rte = root->simple_rte_array[rti]; + RelOptInfo *rel = root->simple_rel_array[rti]; + + /* only plain relations can be dimensions (we need FK/PK join) */ + if ((rte->rtekind != RTE_RELATION) || + (rel->reloptkind != RELOPT_BASEREL)) + return false; + + /* + * Does it have any conditions/restrictions that may affect the number + * of rows matched? If yes, don't treat as dimension. + * + * Dimensions in a starjoin may have restrictions, but that means it'll + * change cardinality of the joins (reduce it), so it may be better to + * join it early. We leave it to the regular join order planning. The + * expectation is that most dimensions won't have extra restrictions. + * + * XXX Should we check some other fields, like lateral references, and + * so on? Or is that unnecessary? What about eclasses? + */ + if (rel->baserestrictinfo != NIL) + return false; + + /* + * See if the join clause matches a foreign key. It should match a + * single relation on the other side, and the dimension should be on + * the PK side. + * + * XXX loosely inspired by get_foreign_key_join_selectivity() + */ + if (!starjoin_match_to_foreign_key(root, rel)) + return false; + + /* + * XXX Maybe some additional checks here ... + */ + + return true; +} + +/* + * starjoin_adjust_joins + * Adjust the jointree for starjoins, to simplify the join order search. + * + * The join search for starjoin queries is surprisingly expensive, because + * there are very few join order restrictions. Consider a starjoin query + * + * SELECT * FROM f + * JOIN d1 ON (f.id1 = d1.id) + * JOIN d2 ON (f.id2 = d2.id) + * ... + * JOIN d9 ON (f.id9 = d9.id) + * + * There are no clauses between the dimension tables (d#), which means those + * tables can be joined in almost arbitrary order. This means the standard + * join_order_search() would explore a N! possible join orders. It is not + * that bad in practice, because we split the problem by from_collapse_limit + * into a sequence of smaller problems, but even for the default limit of + * 8 relations it's quite expensive. This can be easily demonstrated by + * setting from_collapse_limit=1 for example starjoin queries. + * + * The idea here is to apply a much simpler join order search for this type + * of queries, without too much risk of picking a much worse plans. It is + * however a trade off between how expensive we allow this to be, and how + * good the decisions will be. This can help only starjoins with multiple + * dimension tables, and we don't want to harm planning of other queries, + * so the basic "query shape" detection needs to be very cheap. And then + * it needs to be cheaper than the regular join order search. + * + * If a perfect check is impossible or too expensive, it's better to end + * up with a cheap false negative (i.e. and not use the optimization), + * rather than risk regressions in other cases. + * + * The simplified join order search relies on the fact that if the joins + * to dimensions do not alter the cardinality of the join relation, then + * the relative order of those joins does not matter. All the possible + * orders are guaranteed to perform the same. So we can simply pick one + * of those orders, and "hardcode" it in the join tree later passed to the + * join_order_search(). + * + * The query may involve joins to additional (non-dimension) tables, and + * those may alter cardinality. Some joins may increase it, other joins + * may decrease it. In principle, it'd be best to first perform all the + * joins that reduce join size, then join all the dimensions, and finally + * perform joins that may increase the join size. But this is not done + * now, currently we simply apply all the dimensions at the end, hoping + * that the earlier joins did not inflate the join too much. + * + * The definition of a dimension is a bit vague. For our definition see + * the comment at starjoin_is_dimension(). + * + * The optimization works by manipulating the joinlist (originally built + * by deconstruct_jointree), which decomposed the original jointree into + * smaller "problems" depending on join type and join_collapse_limit. We + * inspect those smaller lists, and selectively split them into smaller + * problems to force a join order. This may effectively undo some of the + * merging done by deconstruct_jointree(), which tries to build problems + * with up to join_collapse_limit relations. + * + * For example, imagine a join problem with 8 rels - one fact table and + * then 7 dimensions, which we can represent a joinlist with 8 elements. + * + * (D7, D6, D5, D4, D3, D2, D1, F) + * + * Assuming all those joins meet the requirements (have a matching FK, + * don't affect the join cardinality, ...), then we can split this into + * + * (D7, (D6, (D5, (D4, (D3, (D2, (D1, F))))))) + * + * which is a nested joinlist, with only two elements on each level. That + * means there's no need for expensive join order search, there's only + * one way to join the relations (two, if we consider the relations may + * switch sides). + * + * The joinlist may already be nested, with multiple smaller subproblems. + * We look at each individual join problem independently, i.e. we don't + * try to merge problems to build join_collapse_limit problems again. + * This is partially to keep it cheap/simple, but also so not change + * behavior for cases when people use join_collapse_limit to force some + * particular join shape. + * + * XXX A possible improvement is to allow handling snowflake joins, i.e. + * recursive dimensions. That would require a somewhat more complicated + * processing, because a dimension would be allowed other rels, as long + * as those are dimensions too. And we'd need to be more careful about + * the order in which join them to the top of the join. + * + * XXX One possible risk is that moving the dimension joins at the very + * end may move that after joins that increase the cardinality. Which + * may cause a regression. Such joins however don't seem very common, at + * least in regular starjoin queries. So maybe we could simply check if + * there are any such joins and disable this optimization. Is there a + * cheap way to identify that a join increases cardinality? + * + * XXX Ideally, we'd perform the dimension joins at the place with the + * lowest cardinality. Imagine a joinlist + * + * (D1, D2, A, B, F) + * + * Where A increases join cardinality, while B does not (possibly even + * reduces it). Ideally, we'd do the join like this + * + * (A, (D2, (D1, (B, F)))) + * + * so D1/D2 get joined at the point of "lowest cardinality". We probably + * don't want to do all this cardinality estimation work here, it'd copy + * what we already do in the join_order_search(). Perhaps we could invent + * a "join item" representing a join to all those dimensions, and pass it + * to join_order_search()? And let it pick the right place for it? It'd + * always join them in the same order, it'd not reorder them. It would + * still do the regular cardinality estimations etc. It would be trivial + * to disable the optimization if needed - don't collapse the dimensions + * into the new type of join item. + */ +List * +starjoin_adjust_joins(PlannerInfo *root, List *joinlist) +{ + ListCell *lc; + List *newlist = NIL; + List *dimensions = NIL; + + /* + * Do nothing if starjoin optimization not enabled / not applicable. + * + * XXX It might seems we can skip this for lists with <= 2 items, but + * that does not work, the elements may be nested list, and we need to + * descend into those. So do what remove_useless_self_joins() does, and + * only bail out for the simplest single-relation case (i.e. no joins). + */ + if (!enable_starjoin_join_search || joinlist == NIL || + (list_length(joinlist) == 1 && !IsA(linitial(joinlist), List))) + return joinlist; + + /* + * Process the current join problem - split the elements into dimensions + * and non-dimensions. If there are dimensions, add them back at the end, + * as small single-rel joins. + * + * The list may contain various types of elements. It may contain a list, + * which means it's an independent join search problem and we need to + * process it recursively. Or it may be RangeTblRef, in which case we + * need to check if it's a dimension. Other types of elements are just + * added back to the list as-is. + * + * XXX I think we need to be careful to keep the order of the list (for + * the non-dimension entries). The join_search_one_level() relies on + * that when handling join order restrictions. + * + * XXX It might be better to only create a new list if needed, i.e. once + * we find the first dimension. So that non-starjoin queries don't pay + * for something they don't need. A mutable iterator might be a way, but + * I'm not sure how expensive this really is. + */ + foreach (lc, joinlist) + { + Node *item = (Node *) lfirst(lc); + + /* a separate join search problem, handle it recursively */ + if (IsA(item, List)) + { + newlist = lappend(newlist, + starjoin_adjust_joins(root, (List *) item)); + continue; + } + + /* + * Non-RangeTblRef elements can't be considered a dimension (only + * baserels can have FK, etc.), so just add those to the list. + */ + if (!IsA(item, RangeTblRef)) + { + newlist = lappend(newlist, item); + continue; + } + + /* + * An entry representing a baserel. If it's a dimension, save it in + * a separate list, and we'll add it at the "top" of the join at the + * end. Otherwise add it to the list just like other elements. + */ + if (starjoin_is_dimension(root, (RangeTblRef *) item)) + { + dimensions = lappend(dimensions, item); + continue; + } + + /* not a dimension, add it to the list directly */ + newlist = lappend(newlist, item); + } + + /* + * If we found some dimensions, add them to the join tree one by one. + * The exact order does not matter, so we add them in the order we + * found them in the original list. + * + * We need to add them by creating smaller 2-element lists, with the rest + * of the list being a sub-problem and then adding the dimension + * table. This is how we force the desired join order. + */ + foreach (lc, dimensions) + { + newlist = list_make2(newlist, lfirst(lc)); + } + + return newlist; +} diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 5467e094ca7..c75a5203aae 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -282,6 +282,16 @@ query_planner(PlannerInfo *root, */ distribute_row_identity_vars(root); + /* + * Try to simplify the join search problem for starjoin-like joins, with + * joins over FK relationships. The dimensions can be joined in almost + * any order, so the join search can be close to factorial complexity. + * But there's not much difference between such join orders, so we just + * leave the dimensions at the end of each group (as determined by the + * join_collapse_limit earlier). + */ + joinlist = starjoin_adjust_joins(root, joinlist); + /* * Ready to do the primary planning. */ diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat index 6bc6be13d2a..bde4378527e 100644 --- a/src/backend/utils/misc/guc_parameters.dat +++ b/src/backend/utils/misc/guc_parameters.dat @@ -203,6 +203,13 @@ boot_val => 'true', }, +{ name => 'enable_starjoin_join_search', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD', + short_desc => 'Enables simplified join order planning for starjoins.', + flags => 'GUC_EXPLAIN', + variable => 'enable_starjoin_join_search', + boot_val => 'false', +}, + { name => 'geqo', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_GEQO', short_desc => 'Enables genetic query optimization.', long_desc => 'This algorithm attempts to do planning without exhaustive searching.', diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 9d3debcab28..fee6c695d03 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -21,6 +21,7 @@ #define DEFAULT_CURSOR_TUPLE_FRACTION 0.1 extern PGDLLIMPORT double cursor_tuple_fraction; extern PGDLLIMPORT bool enable_self_join_elimination; +extern PGDLLIMPORT bool enable_starjoin_join_search; /* query_planner callback to compute query_pathkeys */ typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra); @@ -119,6 +120,7 @@ extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids, JoinType jointype, List *restrictlist, bool force_cache, List **extra_clauses); extern List *remove_useless_self_joins(PlannerInfo *root, List *joinlist); +extern List *starjoin_adjust_joins(PlannerInfo *root, List *joinlist); /* * prototypes for plan/setrefs.c