Обсуждение: Erronous sort used in query plan

Поиск
Список
Период
Сортировка

Erronous sort used in query plan

От
Shane Ambler
Дата:
I am putting together searches on the catalog info and came up with a
select that was rather slow and I noticed that in the explain analyze
there is a sort step on one of the left joins which I don't think
belongs there.

I found the small error in my query (using tl.oid instead of tr.oid and
tres.oid) that caused the query to slow down and generate the sort in
the plan but am not sure that the given condition should even generate a
sort step and if it does then I believe it should be a (more?) stable
decision.

Removing one of the left join's that is in error (tr or tres) changes
the column that is sorted, neither of which is related to the join/s
that appear to generate the step.

With tl, tr and tres in place the sort is performed on pjoin.oid.

Removing or correcting either tr or tres the sort is changed to perform
on olsort.oid.

Removing or correcting both tr and tres removes the sort from the plan.

Also - removing all the pg_operator joins the sort is still there (on
pjoin.oid) but if I remove one of the erroneous joins as well the sort
goes. (correcting one of the joins leaves the sort there but removing it
removes the sort)

Using postgres 8.2.0 on Mac OSX 10.4.8


The full query is -

explain analyze
SELECT
o.oid as "OID"
, n.nspname as "Schema"
, o.oprname as "Name"
, r.rolname as "Owner"
, CASE WHEN o.oprkind='b' THEN 'infix(left and right)'
WHEN o.oprkind='l' THEN 'prefix (left)'
WHEN o.oprkind='r' THEN 'postfix (right)'
END as "Kind"
, CASE WHEN o.oprcanhash='t' THEN 'Yes'
WHEN o.oprcanhash='f' THEN 'No' END as "Supports Hash Joins"
, tl.typname as "Left Operand"
, tr.typname as "Right Operand"
, tres.typname as "Result Type"
, ocom.oprname as "Commutator Operator"
, onegate.oprname as "Negator Operator"
, olsort.oprname as "Left Sort Operator"
, orsort.oprname as "Right Sort Operator"
, oltcm.oprname as "Less Than Operator"
, ogtcm.oprname as "Greater Than Operator"
, pcode.proname as "Operator Function"
, prest.proname as "Restriction Selectivity Function"
, pjoin.proname as "Join Selectivity Function"


FROM pg_catalog.pg_operator o
left join pg_catalog.pg_namespace n ON n.oid = o.oprnamespace
left join pg_catalog.pg_roles r on r.oid=o.oprowner
left join pg_catalog.pg_type tl on tl.oid=o.oprleft
left join pg_catalog.pg_type tr on tl.oid=o.oprright
left join pg_catalog.pg_type tres on tl.oid=o.oprresult
left join pg_catalog.pg_operator ocom on ocom.oid=o.oprcom
left join pg_catalog.pg_operator onegate on onegate.oid=o.oprnegate
left join pg_catalog.pg_operator oneg on oneg.oid=o.oprnegate
left join pg_catalog.pg_operator olsort on olsort.oid=o.oprlsortop
left join pg_catalog.pg_operator orsort on orsort.oid=o.oprrsortop
left join pg_catalog.pg_operator oltcm on oltcm.oid=o.oprltcmpop
left join pg_catalog.pg_operator ogtcm on ogtcm.oid=o.oprgtcmpop
left join pg_catalog.pg_proc pcode on pcode.oid=o.oprcode
left join pg_catalog.pg_proc prest on prest.oid=o.oprrest
left join pg_catalog.pg_proc pjoin on pjoin.oid=o.oprjoin

WHERE n.nspname like 'public'



I have attached a copy of the query and plan.
--

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz
explain analyze
SELECT
o.oid as "OID"
, n.nspname as "Schema"
, o.oprname as "Name"
, r.rolname as "Owner"
, CASE WHEN o.oprkind='b' THEN 'infix(left and right)'
WHEN o.oprkind='l' THEN 'prefix (left)'
WHEN o.oprkind='r' THEN 'postfix (right)'
END as "Kind"
, CASE WHEN o.oprcanhash='t' THEN 'Yes' WHEN o.oprcanhash='f' THEN 'No' END as "Supports Hash Joins"
, tl.typname as "Left Operand"
, tr.typname as "Right Operand"
, tres.typname as "Result Type"
, ocom.oprname as "Commutator Operator"
, onegate.oprname as "Negator Operator"
, olsort.oprname as "Left Sort Operator"
, orsort.oprname as "Right Sort Operator"
, oltcm.oprname as "Less Than Operator"
, ogtcm.oprname as "Greater Than Operator"
, pcode.proname as "Operator Function"
, prest.proname as "Restriction Selectivity Function"
, pjoin.proname as "Join Selectivity Function"


FROM pg_catalog.pg_operator o
left join pg_catalog.pg_namespace n ON n.oid = o.oprnamespace
left join pg_catalog.pg_roles r on r.oid=o.oprowner
left join pg_catalog.pg_type tl on tl.oid=o.oprleft
left join pg_catalog.pg_type tr on tl.oid=o.oprright
left join pg_catalog.pg_type tres on tl.oid=o.oprresult
left join pg_catalog.pg_operator ocom on ocom.oid=o.oprcom
left join pg_catalog.pg_operator onegate on onegate.oid=o.oprnegate
left join pg_catalog.pg_operator oneg on oneg.oid=o.oprnegate
left join pg_catalog.pg_operator olsort on olsort.oid=o.oprlsortop
left join pg_catalog.pg_operator orsort on orsort.oid=o.oprrsortop
left join pg_catalog.pg_operator oltcm on oltcm.oid=o.oprltcmpop
left join pg_catalog.pg_operator ogtcm on ogtcm.oid=o.oprgtcmpop
left join pg_catalog.pg_proc pcode on pcode.oid=o.oprcode
left join pg_catalog.pg_proc prest on prest.oid=o.oprrest
left join pg_catalog.pg_proc pjoin on pjoin.oid=o.oprjoin


WHERE n.nspname like 'public'


ORDER BY lower(n.nspname), lower(o.oprname)



                                                                                                                  QUERY
PLAN                                                                                                                  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2707.30..2707.66 rows=143 width=966) (actual time=22085.090..23284.736 rows=314064 loops=1)
   Sort Key: lower((n.nspname)::text), lower((o.oprname)::text)
   ->  Hash Left Join  (cost=2652.63..2702.18 rows=143 width=966) (actual time=3668.065..9163.877 rows=314064 loops=1)
         Hash Cond: ((o.oprcode)::oid = pcode.oid)
         ->  Hash Left Join  (cost=2567.13..2600.79 rows=143 width=906) (actual time=3661.366..7305.265 rows=314064
loops=1)
               Hash Cond: ((o.oprrest)::oid = prest.oid)
               ->  Hash Left Join  (cost=2481.63..2504.16 rows=143 width=846) (actual time=3654.311..6704.433
rows=314064loops=1) 
                     Hash Cond: (o.oprlsortop = olsort.oid)
                     ->  Hash Left Join  (cost=2453.91..2474.29 rows=143 width=786) (actual time=3651.827..6250.536
rows=314064loops=1) 
                           Hash Cond: (o.oprrsortop = orsort.oid)
                           ->  Hash Left Join  (cost=2426.18..2444.42 rows=143 width=726) (actual
time=3649.287..5795.792rows=314064 loops=1) 
                                 Hash Cond: (o.oprltcmpop = oltcm.oid)
                                 ->  Hash Left Join  (cost=2398.46..2414.55 rows=143 width=666) (actual
time=3646.749..5331.642rows=314064 loops=1) 
                                       Hash Cond: (o.oprgtcmpop = ogtcm.oid)
                                       ->  Merge Left Join  (cost=2370.73..2384.68 rows=143 width=606) (actual
time=3643.994..4867.855rows=314064 loops=1) 
                                             Merge Cond: ("outer"."?column19?" = pjoin.oid)
                                             ->  Sort  (cost=2158.92..2159.27 rows=143 width=546) (actual
time=3634.598..3800.837rows=314064 loops=1) 
                                                   Sort Key: (o.oprjoin)::oid
                                                   ->  Hash Left Join  (cost=49.70..2153.80 rows=143 width=546) (actual
time=5.883..2061.807rows=314064 loops=1) 
                                                         Hash Cond: (o.oprnegate = oneg.oid)
                                                         ->  Hash Left Join  (cost=21.98..2123.93 rows=143 width=550)
(actualtime=4.022..1348.066 rows=314064 loops=1) 
                                                               Hash Cond: (o.oprowner = r.oid)
                                                               ->  Nested Loop Left Join  (cost=20.89..2120.69 rows=143
width=490)(actual time=3.878..709.734 rows=314064 loops=1) 
                                                                     Join Filter: (tl.oid = o.oprright)
                                                                     ->  Nested Loop Left Join  (cost=10.44..1103.17
rows=143width=434) (actual time=3.843..73.766 rows=1128 loops=1) 
                                                                           Join Filter: (tl.oid = o.oprresult)
                                                                           ->  Nested Loop Left Join  (cost=0.00..85.65
rows=143width=374) (actual time=2.654..13.504 rows=192 loops=1) 
                                                                                 ->  Nested Loop Left Join
(cost=0.00..75.35rows=143 width=310) (actual time=2.637..10.730 rows=192 loops=1) 
                                                                                       ->  Nested Loop Left Join
(cost=0.00..56.37rows=143 width=250) (actual time=2.625..8.149 rows=192 loops=1) 
                                                                                             ->  Nested Loop
(cost=0.00..37.38rows=143 width=186) (actual time=2.588..4.261 rows=192 loops=1) 
                                                                                                   Join Filter: (n.oid
=o.oprnamespace) 
                                                                                                   ->  Seq Scan on
pg_namespacen  (cost=0.00..1.07 rows=1 width=68) (actual time=0.052..0.054 rows=1 loops=1) 
                                                                                                         Filter:
(nspname~~ 'public'::text) 
                                                                                                   ->  Seq Scan on
pg_operatoro  (cost=0.00..25.58 rows=858 width=126) (actual time=0.022..3.408 rows=858 loops=1) 
                                                                                             ->  Index Scan using
pg_operator_oid_indexon pg_operator onegate  (cost=0.00..0.12 rows=1 width=68) (actual time=0.014..0.016 rows=1
loops=192)
                                                                                                   Index Cond:
(onegate.oid= o.oprnegate) 
                                                                                       ->  Index Scan using
pg_operator_oid_indexon pg_operator ocom  (cost=0.00..0.12 rows=1 width=68) (actual time=0.008..0.010 rows=1 loops=192) 
                                                                                             Index Cond: (ocom.oid =
o.oprcom)
                                                                                 ->  Index Scan using pg_type_oid_index
onpg_type tl  (cost=0.00..0.06 rows=1 width=68) (actual time=0.007..0.011 rows=1 loops=192) 
                                                                                       Index Cond: (tl.oid = o.oprleft)
                                                                           ->  Materialize  (cost=10.44..13.57 rows=313
width=64)(actual time=0.001..0.125 rows=313 loops=192) 
                                                                                 ->  Seq Scan on pg_type tr2
(cost=0.00..10.13rows=313 width=64) (actual time=0.013..0.328 rows=313 loops=1) 
                                                                     ->  Materialize  (cost=10.44..13.57 rows=313
width=64)(actual time=0.000..0.133 rows=313 loops=1128) 
                                                                           ->  Seq Scan on pg_type tr
(cost=0.00..10.13rows=313 width=64) (actual time=0.007..0.320 rows=313 loops=1) 
                                                               ->  Hash  (cost=1.08..1.08 rows=4 width=68) (actual
time=0.065..0.065rows=5 loops=1) 
                                                                     ->  Subquery Scan r  (cost=0.00..1.08 rows=4
width=68)(actual time=0.021..0.037 rows=5 loops=1) 
                                                                           ->  Seq Scan on pg_authid  (cost=0.00..1.04
rows=4width=118) (actual time=0.018..0.027 rows=5 loops=1) 
                                                         ->  Hash  (cost=25.58..25.58 rows=858 width=4) (actual
time=1.797..1.797rows=858 loops=1) 
                                                               ->  Seq Scan on pg_operator oneg  (cost=0.00..25.58
rows=858width=4) (actual time=0.012..0.781 rows=858 loops=1) 
                                             ->  Sort  (cost=211.81..217.71 rows=2360 width=68) (actual
time=9.366..104.265rows=216851 loops=1) 
                                                   Sort Key: pjoin.oid
                                                   ->  Seq Scan on pg_proc pjoin  (cost=0.00..79.60 rows=2360 width=68)
(actualtime=0.039..2.700 rows=2360 loops=1) 
                                       ->  Hash  (cost=25.58..25.58 rows=858 width=68) (actual time=2.589..2.589
rows=858loops=1) 
                                             ->  Seq Scan on pg_operator ogtcm  (cost=0.00..25.58 rows=858 width=68)
(actualtime=0.040..1.024 rows=858 loops=1) 
                                 ->  Hash  (cost=25.58..25.58 rows=858 width=68) (actual time=2.448..2.448 rows=858
loops=1)
                                       ->  Seq Scan on pg_operator oltcm  (cost=0.00..25.58 rows=858 width=68) (actual
time=0.016..0.909rows=858 loops=1) 
                           ->  Hash  (cost=25.58..25.58 rows=858 width=68) (actual time=2.471..2.471 rows=858 loops=1)
                                 ->  Seq Scan on pg_operator orsort  (cost=0.00..25.58 rows=858 width=68) (actual
time=0.015..0.923rows=858 loops=1) 
                     ->  Hash  (cost=25.58..25.58 rows=858 width=68) (actual time=2.409..2.409 rows=858 loops=1)
                           ->  Seq Scan on pg_operator olsort  (cost=0.00..25.58 rows=858 width=68) (actual
time=0.022..0.928rows=858 loops=1) 
               ->  Hash  (cost=79.60..79.60 rows=2360 width=68) (actual time=6.983..6.983 rows=2360 loops=1)
                     ->  Seq Scan on pg_proc prest  (cost=0.00..79.60 rows=2360 width=68) (actual time=0.020..2.793
rows=2360loops=1) 
         ->  Hash  (cost=79.60..79.60 rows=2360 width=68) (actual time=6.571..6.571 rows=2360 loops=1)
               ->  Seq Scan on pg_proc pcode  (cost=0.00..79.60 rows=2360 width=68) (actual time=0.017..2.593 rows=2360
loops=1)
 Total runtime: 24195.085 ms
(65 rows)

Re: Erronous sort used in query plan

От
Tom Lane
Дата:
Shane Ambler <pgsql@007Marketing.com> writes:
> I am putting together searches on the catalog info and came up with a 
> select that was rather slow and I noticed that in the explain analyze 
> there is a sort step on one of the left joins which I don't think 
> belongs there.

Well, it's certainly necessary in context because it's preparing the
data for the merge join immediately above it.  The correct question
is why is the thing using a merge join here, when a hash join would be
cheaper?

I dug through this and found out that the hash join is estimated as
cheaper, right up till the last step of cost_hashjoin:
   /*    * Bias against putting larger relation on inside.  We don't want an    * absolute prohibition, though, since
largerrelation might have better    * bucketsize --- and we can't trust the size estimates unreservedly,    * anyway.
Instead,inflate the run cost by the square root of the size    * ratio.  (Why square root?  No real good reason, but it
seems   * reasonable...)    *    * Note: before 7.4 we implemented this by inflating startup cost; but if    * there's
adisable_cost component in the input paths' startup cost, that    * unfairly penalizes the hash.  Probably it'd be
betterto keep track of    * disable penalty separately from cost.    */   if (innerbytes > outerbytes && outerbytes >
0)      run_cost *= sqrt(innerbytes / outerbytes);
 

In this example, the data volume from the join of everything else is
estimated as less than what needs to be fetched from pg_proc, and so
this bias kicks in, and the cost estimate roughly doubles.
Unfortunately, because it's a LEFT JOIN, we'll never consider hashjoin
in the other direction and so the hash loses out to the mergejoin.

It seems clear to me that we ought not impose a bias unless the join
type is such that both directions of hashing are feasible.  I wonder
also if the bias is too large ... but there's not really evidence for
or against that in this example.  The point is that this code implicitly
assumes both directions will be tried, and they won't.
        regards, tom lane


Re: Erronous sort used in query plan

От
Shane Ambler
Дата:
Tom Lane wrote:
> Shane Ambler <pgsql@007Marketing.com> writes:
>> I am putting together searches on the catalog info and came up with a 
>> select that was rather slow and I noticed that in the explain analyze 
>> there is a sort step on one of the left joins which I don't think 
>> belongs there.
> 
> Well, it's certainly necessary in context because it's preparing the
> data for the merge join immediately above it.  The correct question
> is why is the thing using a merge join here, when a hash join would be
> cheaper?
> 
> I dug through this and found out that the hash join is estimated as
> cheaper, right up till the last step of cost_hashjoin:
> 
>     /*
>      * Bias against putting larger relation on inside.  We don't want an
>      * absolute prohibition, though, since larger relation might have better
>      * bucketsize --- and we can't trust the size estimates unreservedly,
>      * anyway.  Instead, inflate the run cost by the square root of the size
>      * ratio.  (Why square root?  No real good reason, but it seems
>      * reasonable...)
>      *
>      * Note: before 7.4 we implemented this by inflating startup cost; but if
>      * there's a disable_cost component in the input paths' startup cost, that
>      * unfairly penalizes the hash.  Probably it'd be better to keep track of
>      * disable penalty separately from cost.
>      */
>     if (innerbytes > outerbytes && outerbytes > 0)
>         run_cost *= sqrt(innerbytes / outerbytes);
> 
> In this example, the data volume from the join of everything else is
> estimated as less than what needs to be fetched from pg_proc, and so
> this bias kicks in, and the cost estimate roughly doubles.
> Unfortunately, because it's a LEFT JOIN, we'll never consider hashjoin
> in the other direction and so the hash loses out to the mergejoin.
> 
> It seems clear to me that we ought not impose a bias unless the join
> type is such that both directions of hashing are feasible.  I wonder
> also if the bias is too large ... but there's not really evidence for
> or against that in this example.  The point is that this code implicitly
> assumes both directions will be tried, and they won't.
> 

I think that the selected sort (or at least the merge join) is incorrect 
- the column sorted (or both actually) is linking the current record in 
pg_operator with the oid in the pg_proc - it will only return one row.

If one of the pg_type joins is changed, it then sorts on the oid of 
pg_operator as the foreign table - again this will only return one row.

I would think that the foreign oid would indicate to the planner that it 
will only find one foreign row to link with.


I can see that the error I made created a funny (probably useless 
actually) link that would throw things out, but I would expect it to
create bad planning for the two joins that are in error not a 
non-related one to one join. If a sort/merge join was created from this 
and used to satisfy this join I would accept that as part of what I 
unintentionally requested but instead it generates a sort/merge join on 
a join that links one current record to one foreign record and has 
nothing in common with the joins in error.



-- 

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz


Re: Erronous sort used in query plan

От
Tom Lane
Дата:
Shane Ambler <pgsql@007Marketing.com> writes:
> Tom Lane wrote:
>> It seems clear to me that we ought not impose a bias unless the join
>> type is such that both directions of hashing are feasible.

> I think that the selected sort (or at least the merge join) is incorrect 
> - the column sorted (or both actually) is linking the current record in 
> pg_operator with the oid in the pg_proc - it will only return one row.

That hasn't got anything to do with it AFAICS.  The plan is not wrong,
it's just inefficient.

I dug around in old files and found out that the notion of
discriminating against hash joins with the larger rel on the inside
is ancient: in Postgres v4r2, the oldest tarball I have, cost_hashjoin
contains
   int outerpages = page_size (outersize,outerwidth);   int innerpages = page_size (innersize,innerwidth);
   if (outerpages < innerpages)      return _disable_cost_;

and while the code's been rewritten several times, the lineage is clear.

It seems to me now that I've thought about it that we should just get
rid of this bias entirely: it is demonstrably wrong if the join is not
an inner join, or if the larger relation has significantly better
dispersion across hash buckets.  To the extent that it's accomplishing
anything good at all, the behavior ought to be emergent from the rest of
the cost model.

I experimented a bit with the idea and found out that without the bias,
it was willing to flip over to larger-relation-on-the-inside in cases
where that actually made it a bit slower.  I interpret this as meaning
that we're undercharging for loading the hash table in comparison to
using it.  The cost model as it stands (but minus bias) effectively says
that for two relations that both fit into work_mem and have equally
well-dispersed hash keys, it's actually better to have the larger rel on
the inside!  The number of hash-function calculations is the same either
way (one for each tuple in the two rels), and the executor tries to hold
the number-of-tuples-per-hash-bucket constant at about 10 regardless of
relation size, so when you trace through the calculations you find the
only differential between the total-cost estimates for the two
mirror-image hash joins is the number of outer tuples for which
hashtable probes must be made; hence the fewer of them the better.
Experimentation shows this isn't so, however, which suggests that the
cost of inserting a tuple into the hashtable is a non-ignorable cost.
I got results that seemed to track reality better after I added
cpu_tuple_cost per hashtable entry and discounted the per-outer-tuple
cost a bit to compensate.

I also noticed that the cost estimate for having to do I/O in a batched
hashjoin had missed getting updated for the 8.2 seq_page_cost changes;
this is a plain ol' oversight on my part.

In short, then, I'm proposing the attached patch for HEAD and 8.2;
I'm not sure whether to change things further back.  Comments?
        regards, tom lane

Index: costsize.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v
retrieving revision 1.172
diff -c -r1.172 costsize.c
*** costsize.c    5 Jan 2007 22:19:31 -0000    1.172
--- costsize.c    8 Jan 2007 01:00:56 -0000
***************
*** 1498,1507 ****     double        hashjointuples;     double        outer_path_rows = PATH_ROWS(outer_path);
double       inner_path_rows = PATH_ROWS(inner_path);
 
-     double        outerbytes = relation_byte_size(outer_path_rows,
-                                                 outer_path->parent->width);
-     double        innerbytes = relation_byte_size(inner_path_rows,
-                                                 inner_path->parent->width);     int            num_hashclauses =
list_length(hashclauses);    int            numbuckets;     int            numbatches;
 
--- 1498,1503 ----
***************
*** 1538,1550 ****      /*      * Cost of computing hash function: must do it once per input tuple. We
!      * charge one cpu_operator_cost for each column's hash function.      *      * XXX when a hashclause is more
complexthan a single operator, we really      * should charge the extra eval costs of the left or right side, as      *
appropriate,here.  This seems more work than it's worth at the moment.      */
 
!     startup_cost += cpu_operator_cost * num_hashclauses * inner_path_rows;     run_cost += cpu_operator_cost *
num_hashclauses* outer_path_rows;      /* Get hash table size that executor would use for inner relation */
 
--- 1534,1549 ----      /*      * Cost of computing hash function: must do it once per input tuple. We
!      * charge one cpu_operator_cost for each column's hash function.  Also,
!      * tack on one cpu_tuple_cost per inner row, to model the costs of
!      * inserting the row into the hashtable.      *      * XXX when a hashclause is more complex than a single
operator,we really      * should charge the extra eval costs of the left or right side, as      * appropriate, here.
Thisseems more work than it's worth at the moment.      */
 
!     startup_cost += (cpu_operator_cost * num_hashclauses + cpu_tuple_cost)
!         * inner_path_rows;     run_cost += cpu_operator_cost * num_hashclauses * outer_path_rows;      /* Get hash
tablesize that executor would use for inner relation */
 
***************
*** 1624,1631 ****     /*      * If inner relation is too big then we will need to "batch" the join,      * which
implieswriting and reading most of the tuples to disk an extra
 
!      * time.  Charge one cost unit per page of I/O (correct since it should be
!      * nice and sequential...).  Writing the inner rel counts as startup cost,      * all the rest as run cost.
*/    if (numbatches > 1)
 
--- 1623,1630 ----     /*      * If inner relation is too big then we will need to "batch" the join,      * which
implieswriting and reading most of the tuples to disk an extra
 
!      * time.  Charge seq_page_cost per page, since the I/O should be nice and
!      * sequential.  Writing the inner rel counts as startup cost,      * all the rest as run cost.      */     if
(numbatches> 1)
 
***************
*** 1635,1642 ****         double        innerpages = page_size(inner_path_rows,
   inner_path->parent->width); 
 
!         startup_cost += innerpages;
!         run_cost += innerpages + 2 * outerpages;     }      /* CPU costs */
--- 1634,1641 ----         double        innerpages = page_size(inner_path_rows,
   inner_path->parent->width); 
 
!         startup_cost += seq_page_cost * innerpages;
!         run_cost += seq_page_cost * (innerpages + 2 * outerpages);     }      /* CPU costs */
***************
*** 1654,1667 ****      * The number of tuple comparisons needed is the number of outer tuples      * times the typical
numberof tuples in a hash bucket, which is the inner      * relation size times its bucketsize fraction.  At each one,
weneed to
 
!      * evaluate the hashjoin quals.  (Note: charging the full qual eval cost
!      * at each tuple is pessimistic, since we don't evaluate the quals unless
!      * the hash values match exactly.)      */     startup_cost += hash_qual_cost.startup;     run_cost +=
hash_qual_cost.per_tuple*         outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) *
 
!         joininfactor;      /*      * For each tuple that gets through the hashjoin proper, we charge
--- 1653,1667 ----      * The number of tuple comparisons needed is the number of outer tuples      * times the typical
numberof tuples in a hash bucket, which is the inner      * relation size times its bucketsize fraction.  At each one,
weneed to
 
!      * evaluate the hashjoin quals.  But actually, charging the full qual eval
!      * cost at each tuple is pessimistic, since we don't evaluate the quals
!      * unless the hash values match exactly.  For lack of a better idea, halve
!      * the cost estimate to allow for that.      */     startup_cost += hash_qual_cost.startup;     run_cost +=
hash_qual_cost.per_tuple*         outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) *
 
!         joininfactor * 0.5;      /*      * For each tuple that gets through the hashjoin proper, we charge
***************
*** 1673,1694 ****     cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;     run_cost += cpu_per_tuple *
hashjointuples* joininfactor; 
 
-     /*
-      * Bias against putting larger relation on inside.    We don't want an
-      * absolute prohibition, though, since larger relation might have better
-      * bucketsize --- and we can't trust the size estimates unreservedly,
-      * anyway.    Instead, inflate the run cost by the square root of the size
-      * ratio.  (Why square root?  No real good reason, but it seems
-      * reasonable...)
-      *
-      * Note: before 7.4 we implemented this by inflating startup cost; but if
-      * there's a disable_cost component in the input paths' startup cost, that
-      * unfairly penalizes the hash.  Probably it'd be better to keep track of
-      * disable penalty separately from cost.
-      */
-     if (innerbytes > outerbytes && outerbytes > 0)
-         run_cost *= sqrt(innerbytes / outerbytes);
-      path->jpath.path.startup_cost = startup_cost;     path->jpath.path.total_cost = startup_cost + run_cost; }
--- 1673,1678 ----