Re: New design for FK-based join selectivity estimation

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: New design for FK-based join selectivity estimation
Дата
Msg-id f90da728-13a5-c288-b4e1-3c3b677d3778@2ndquadrant.com
обсуждение исходный текст
Ответ на New design for FK-based join selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: New design for FK-based join selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 06/04/2016 09:44 PM, Tom Lane wrote:
> This is a branch of the discussion in
> https://www.postgresql.org/message-id/flat/20160429102531.GA13701%40huehner.biz
> but I'm starting a new thread as the original title is getting
> increasingly off-topic.
>
> I complained in that thread that the FK join selectivity patch had a
> very brute-force approach to matching join qual clauses to FK
> constraints, requiring a total of seven nested levels of looping to
> get anything done, and expensively rediscovering the same facts over
> and over.  Here is a sketch of what I think is a better way:
>
> * During the planner's catalog data collection phase, construct a
> single list (per PlannerInfo, not per RTE) of all relevant FKs.
> In this data structure, each FK's referenced and referencing relations
> are identified by relid (that is, RTE index) not just OID.  In the case
> of a query containing a self-join, that would mean that the same FK
> constraint could give rise to multiple list entries, one for each RTE
> occurrence of its referenced or referencing target relation.  FKs not
> relating two tables of the query are necessarily not in this list,
> and we could also choose not to include single-column FKs.
>
> * After finalizing equivalence classes, make a single pass through
> the FK list and check each column-pair to see if it can be matched
> to any EC (that is, both source and target columns are in the EC and
> the comparison operator is in the EC's opfamily).  Mark each matched
> column pair in the FK list data structure with a pointer to the EC.
>
> * When performing join selectivity estimation, run through the FK list
> a single time, ignoring entries that do not link a member of the join's
> LHS to a member of the join's RHS.  This is a fairly cheap test given
> the relid labels; it'd be approximately
>
>     if ((bms_is_member(fkinfo->src_relid, outer_rel->relids) &&
>          bms_is_member(fkinfo->dst_relid, inner_rel->relids)) ||
>         (bms_is_member(fkinfo->dst_relid, outer_rel->relids) &&
>          bms_is_member(fkinfo->src_relid, inner_rel->relids)))
>
> For each potentially interesting FK entry, run through the join
> qual list.  A RestrictInfo that was generated from an EC matches
> the FK if and only if that EC appears in the per-column markings;
> other RestrictInfos are matched to one of the FK columns normally
> (I think this path can ignore FK columns that have been matched to ECs).
> At the end of that, we can determine whether all the FK columns have
> been matched to some qual item, and we have a count and/or bitmapset
> of the qual list entries that matched the FK.  Remember the FK entry
> with the largest such count.
>
> * After scanning the list, we have our best FK match and can proceed
> with making the actual selectivity estimate as in the current code.
>
> With this approach, we have an iteration structure like
>
>   * once per join relation created
>     * for each foreign key constraint relevant to the query
>         (but skipping the loops below if it's not relevant to this join)
>       * for each join qual for the joinrel pair
>         * for each key column in that FK
>
> which gets us down from seven nested loops to four, and also makes the
> work done in the innermost loops significantly cheaper for the EC case,
> which will be the more common one.  It's also much easier to make this
> structure do zero extra work when there are no relevant FKs, which is
> a pleasant property for extra planner work to have.
>
> Now, we'll also add some per-FK-per-EC setup work, but my guess is
> that that's negligible compared to the per-join-relation work.
>
> It's possible that we could reduce the cost of matching non-EC join
> quals as well, with some trick along the line of pre-matching non-EC
> RestrictInfos to FK items.  I'm unsure that that is worth the trouble
> though.
>
> Thoughts?

Firstly thanks for looking into this, and also for coming up with a very 
detailed design proposal.

I do agree this new design seems superior to the current one and it 
seems worth a try. I'd like to see how far we can get over the next few 
days (say, until the end of the week).

One of the recent issues with the current design was handling of 
inheritance / appendrels. ISTM the proposed design has the same issue, 
no? What happens if the relations are partitioned?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Typo in parallel comment of heap_delete()
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: pg9.6 segfault using simple query (related to use fk for join estimates)