New design for FK-based join selectivity estimation

Поиск
Список
Период
Сортировка
От Tom Lane
Тема New design for FK-based join selectivity estimation
Дата
Msg-id 31041.1465069446@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: New design for FK-based join selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: New design for FK-based join selectivity estimation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: New design for FK-based join selectivity estimation  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
This is a branch of the discussion in
https://www.postgresql.org/message-id/flat/20160429102531.GA13701%40huehner.biz
but I'm starting a new thread as the original title is getting
increasingly off-topic.

I complained in that thread that the FK join selectivity patch had a
very brute-force approach to matching join qual clauses to FK
constraints, requiring a total of seven nested levels of looping to
get anything done, and expensively rediscovering the same facts over
and over.  Here is a sketch of what I think is a better way:

* During the planner's catalog data collection phase, construct a
single list (per PlannerInfo, not per RTE) of all relevant FKs.
In this data structure, each FK's referenced and referencing relations
are identified by relid (that is, RTE index) not just OID.  In the case
of a query containing a self-join, that would mean that the same FK
constraint could give rise to multiple list entries, one for each RTE
occurrence of its referenced or referencing target relation.  FKs not
relating two tables of the query are necessarily not in this list,
and we could also choose not to include single-column FKs.

* After finalizing equivalence classes, make a single pass through
the FK list and check each column-pair to see if it can be matched
to any EC (that is, both source and target columns are in the EC and
the comparison operator is in the EC's opfamily).  Mark each matched
column pair in the FK list data structure with a pointer to the EC.

* When performing join selectivity estimation, run through the FK list
a single time, ignoring entries that do not link a member of the join's
LHS to a member of the join's RHS.  This is a fairly cheap test given
the relid labels; it'd be approximately
if ((bms_is_member(fkinfo->src_relid, outer_rel->relids) &&     bms_is_member(fkinfo->dst_relid, inner_rel->relids)) ||
  (bms_is_member(fkinfo->dst_relid, outer_rel->relids) &&     bms_is_member(fkinfo->src_relid, inner_rel->relids)))
 

For each potentially interesting FK entry, run through the join
qual list.  A RestrictInfo that was generated from an EC matches
the FK if and only if that EC appears in the per-column markings;
other RestrictInfos are matched to one of the FK columns normally
(I think this path can ignore FK columns that have been matched to ECs).
At the end of that, we can determine whether all the FK columns have
been matched to some qual item, and we have a count and/or bitmapset
of the qual list entries that matched the FK.  Remember the FK entry
with the largest such count.

* After scanning the list, we have our best FK match and can proceed
with making the actual selectivity estimate as in the current code.

With this approach, we have an iteration structure like
 * once per join relation created   * for each foreign key constraint relevant to the query       (but skipping the
loopsbelow if it's not relevant to this join)     * for each join qual for the joinrel pair       * for each key column
inthat FK
 

which gets us down from seven nested loops to four, and also makes the
work done in the innermost loops significantly cheaper for the EC case,
which will be the more common one.  It's also much easier to make this
structure do zero extra work when there are no relevant FKs, which is
a pleasant property for extra planner work to have.

Now, we'll also add some per-FK-per-EC setup work, but my guess is
that that's negligible compared to the per-join-relation work.

It's possible that we could reduce the cost of matching non-EC join
quals as well, with some trick along the line of pre-matching non-EC
RestrictInfos to FK items.  I'm unsure that that is worth the trouble
though.

Thoughts?
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg9.6 segfault using simple query (related to use fk for join estimates)
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <