Обсуждение: Optimizer questions

Поиск
Список
Период
Сортировка

Optimizer questions

От
konstantin knizhnik
Дата:
Hi hackers, 

I want to ask two questions about PostgreSQL optimizer.
I have the following query:

SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id as dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name,o.title,o.size,o.deleted,la.otime,la.etime,uo.login as owner_login,uc.login as creator_login,(CASE WHEN f.user_id IS NULL THEN 0 ELSE 1 END) AS flagged,(select 'userid\\:'||string_agg(user_id,' userid\\:') from get_authorized_users(o.id)) as acl FROM objects s JOIN objects o ON s.original_file=o.id LEFT JOIN flags f ON o.id=f.obj_id AND o.owner=f.user_id LEFT JOIN objects_last_activity la ON o.id = la.obj_id AND o.owner = la.user_id, mime m, users uc , users uo WHERE (s.mime_id=904 or s.mime_id=908) AND m.mime_id = o.mime_id AND o.owner = uo.user_id AND o.creator = uc.user_id ORDER BY s.mtime LIMIT 9;

It is executed more than one hours. And the same query with "limit 8" is executed in just few seconds.
get_authorized_users is quite expensive function performing recursive query through several tables.

In first case (limit 9) sequential scan with sort is used  while in the second case index scan on "mtime" index is performed and so no sort is needed.
In the second case we can extract just 9 rows and calculate get_authorized_users  for them. In the first case we have to calculate this functions for ALL (~15M) rows.

I investigated plans of both queries and find out there are three reasons of the problem:

1. Wrong estimation of filter selectivity: 65549 vs. 154347 real.
2. Wrong estimation of joins selectivity: 284 vs. 65549 real
3. Wrong estimation of get_authorized_users  cost: 0.25 vs. 57.379 real 

Below is output of explain analyse:

Limit  (cost=1595355.77..1595355.79 rows=9 width=301) (actual time=3823128.752..3823128.755 rows=9 loops=1)
   ->  Sort  (cost=1595355.77..1595356.48 rows=284 width=301) (actual time=3823128.750..3823128.753 rows=9 loops=1)
         Sort Key: s.mtime
         Sort Method: top-N heapsort  Memory: 30kB
         ->  Nested Loop  (cost=1.96..1595349.85 rows=284 width=301) (actual time=75.453..3822829.640 rows=65549 loops=1)
               ->  Nested Loop Left Join  (cost=1.54..1589345.66 rows=284 width=271) (actual time=1.457..59068.107 rows=65549 loops=1)
                     ->  Nested Loop Left Join  (cost=0.98..1586923.67 rows=284 width=255) (actual time=1.430..55661.926 rows=65549 loops=1)
                           Join Filter: (((o.id)::text = (f.obj_id)::text) AND ((o.owner)::text = (f.user_id)::text))
                           Rows Removed by Join Filter: 132670698
                           ->  Nested Loop  (cost=0.98..1576821.09 rows=284 width=236) (actual time=0.163..27721.566 rows=65549 loops=1)
                                 Join Filter: ((o.mime_id)::integer = (m.mime_id)::integer)
                                 Rows Removed by Join Filter: 48768456
                                 ->  Seq Scan on mime m  (cost=0.00..13.45 rows=745 width=30) (actual time=0.008..1.372 rows=745 loops=1)
                                 ->  Materialize  (cost=0.98..1573634.65 rows=284 width=214) (actual time=0.004..22.918 rows=65549 loops=745)
                                       ->  Nested Loop  (cost=0.98..1573633.23 rows=284 width=214) (actual time=0.142..7095.554 rows=65549 loops=1)
                                             ->  Nested Loop  (cost=0.56..1570232.37 rows=406 width=184) (actual time=0.130..6468.376 rows=65549 loops=1)
                                                   ->  Seq Scan on objects s  (cost=0.00..1183948.58 rows=45281 width=63) (actual time=0.098..5346.023 rows=154347 loops=1)
                                                         Filter: (((mime_id)::integer = 904) OR ((mime_id)::integer = 908))
                                                         Rows Removed by Filter: 15191155
                                                   ->  Index Scan using objects_pkey on objects o  (cost=0.56..8.52 rows=1 width=140) (actual time=0.006..0.007 rows=0 loops=154347)
                                                         Index Cond: ((id)::text = (s.original_file)::text)
                                             ->  Index Scan using users_pkey on users uc  (cost=0.42..8.37 rows=1 width=49) (actual time=0.008..0.009 rows=1 loops=65549)
                                                   Index Cond: ((user_id)::text = (o.creator)::text)
                           ->  Materialize  (cost=0.00..48.36 rows=2024 width=38) (actual time=0.001..0.133 rows=2024 loops=65549)
                                 ->  Seq Scan on flags f  (cost=0.00..38.24 rows=2024 width=38) (actual time=0.009..0.462 rows=2024 loops=1)
                     ->  Index Scan using objects_last_activity_unique_index on objects_last_activity la  (cost=0.56..8.52 rows=1 width=54) (actual time=0.044..0.046 rows=1 loops=65549)
                           Index Cond: (((o.id)::text = (obj_id)::text) AND ((o.owner)::text = (user_id)::text))
               ->  Index Scan using users_pkey on users uo  (cost=0.42..8.37 rows=1 width=49) (actual time=0.015..0.019 rows=1 loops=65549)
                     Index Cond: ((user_id)::text = (o.owner)::text)
               SubPlan 1
                 ->  Aggregate  (cost=12.75..12.76 rows=1 width=32) (actual time=57.390..57.391 rows=1 loops=65549)
                       ->  Function Scan on get_authorized_users  (cost=0.25..10.25 rows=1000 width=32) (actual time=57.379..57.379 rows=2 loops=65549)
 Total runtime: 3823133.235 ms
(33 rows)

Then I try to manually adjust cost of function get_authorized_users using "alter function ... cost" statement.
It has effect on plans shown by analyze. Now plan with index scan has much small cost (2.39..1911772.85) than plan with sort (8507669.11..8507669.13).
But still sort is used for query execution unless I set "enable_sort=off".

I tried to investigate work of optimizer and find out that this choice is made in grouping_planner function:

        /*
         * Forget about the presorted path if it would be cheaper to sort the
         * cheapest-total path.  Here we need consider only the behavior at
         * the tuple_fraction point.  Also, limit_tuples is only relevant if
         * not grouping/aggregating, so use root->limit_tuples in the
         * cost_sort call.
         */
        if (sorted_path)
        {
            Path        sort_path;        /* dummy for result of cost_sort */

            if (root->query_pathkeys == NIL ||
                pathkeys_contained_in(root->query_pathkeys,
                                      cheapest_path->pathkeys))
            {
                /* No sort needed for cheapest path */
                sort_path.startup_cost = cheapest_path->startup_cost;
                sort_path.total_cost = cheapest_path->total_cost;
            }
            else
            {
                /* Figure cost for sorting */
                cost_sort(&sort_path, root, root->query_pathkeys,
                          cheapest_path->total_cost,
                          path_rows, path_width,
                          0.0, work_mem, root->limit_tuples);
            }

            if (compare_fractional_path_costs(sorted_path, &sort_path,
                                              tuple_fraction) > 0)
            {
                /* Presorted path is a loser */
                sorted_path = NULL;
            }
        }


In this case cost of sorted path is higher than of seq scan path and so sorted path is rejected.
So now two questions:

1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed when I am altering function cost. Is it correct behavior?

2. Actually there is no need to calculate get_authorized_users functions for ALL records before sort. This field is not involved in any calculations. So we can perform sort, extract 9 rows and then calculate get_authorized_users only for them. I wonder why PostgreSQL optimizer is not trying to split target list into columns which are needed immediately and which are can be calculated later (lazy evaluation of columns). It seems to be very efficient in case of presence of filtering and limiting operators. But for some reasons it is not happen in this query. Certainly I can rewrite it using nested subselect, doing this optimization myself. But I find this optimization to be quite useful...

I can send example reproducing the problem with the latest devel branch. It is using slightly different database schema, dummy data and produce different estimations and timing.
But the problems are persist: optimizer choose the plan with worser cost and doesn't perform lazy evaluation of target list.

Thanks in advance,
Konstantin

 


Re: Optimizer questions

От
Tom Lane
Дата:
konstantin knizhnik <k.knizhnik@postgrespro.ru> writes:
> 1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed
whenI am altering function cost. Is it correct behavior?
 

The general problem of accounting for tlist eval cost is not handled very
well now, but especially not with respect to the idea that different paths
might have different tlist costs.  I'm working on an upper-planner rewrite
which should make this better, or at least make it practical to make it
better.

In the meantime you can probably force things to work the way you want
by doing the sort/limit in a sub-select and evaluating the expensive
function only in the outer select.
        regards, tom lane



Re: Optimizer questions

От
Alexander Korotkov
Дата:
On Wed, Jan 6, 2016 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
konstantin knizhnik <k.knizhnik@postgrespro.ru> writes:
> 1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed when I am altering function cost. Is it correct behavior?

The general problem of accounting for tlist eval cost is not handled very
well now, but especially not with respect to the idea that different paths
might have different tlist costs.  I'm working on an upper-planner rewrite
which should make this better, or at least make it practical to make it
better.

Hmm... Besides costing it would be nice to postpone calculation of expensive tlist functions after LIMIT.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Optimizer questions

От
David Rowley
Дата:
On 6 January 2016 at 13:13, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Jan 6, 2016 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
konstantin knizhnik <k.knizhnik@postgrespro.ru> writes:
> 1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed when I am altering function cost. Is it correct behavior?

The general problem of accounting for tlist eval cost is not handled very
well now, but especially not with respect to the idea that different paths
might have different tlist costs.  I'm working on an upper-planner rewrite
which should make this better, or at least make it practical to make it
better.

Hmm... Besides costing it would be nice to postpone calculation of expensive tlist functions after LIMIT.

I'd agree that it would be more than the costings that would need to be improved here.

The most simple demonstration of the problem I can think of is, if I apply the following:

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 29d92a7..2ec9822 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -641,6 +641,8 @@ int4pl(PG_FUNCTION_ARGS)

        result = arg1 + arg2;

+       elog(NOTICE, "int4pl(%d, %d)", arg1,arg2);
+
        /*
         * Overflow check.  If the inputs are of different signs then their sum
         * cannot overflow.  If the inputs are of the same sign, their sum had

Then do:

create table a (b int);
insert into a select generate_series(1,10);
select b+b as bb from a order by b limit 1;
NOTICE:  int4pl(1, 1)
NOTICE:  int4pl(2, 2)
NOTICE:  int4pl(3, 3)
NOTICE:  int4pl(4, 4)
NOTICE:  int4pl(5, 5)
NOTICE:  int4pl(6, 6)
NOTICE:  int4pl(7, 7)
NOTICE:  int4pl(8, 8)
NOTICE:  int4pl(9, 9)
NOTICE:  int4pl(10, 10)
 bb
----
  2
(1 row)

We can see that int4pl() is needlessly called 9 times. Although, I think this does only apply to queries with LIMIT. I agree that it does seem like an interesting route for optimisation.

It seems worthwhile to investigate how we might go about improving this so that the evaluation of the target list happens after LIMIT, at least for the columns which are not required before LIMIT.

Konstantin, are you thinking of looking into this more, with plans to implement code to improve this?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Optimizer questions

От
Konstantin Knizhnik
Дата:
On 01/06/2016 12:03 PM, David Rowley wrote:
Konstantin, are you thinking of looking into this more, with plans to implement code to improve this?


I am not familiar with PostgreSQL optimizer, but I now looking inside its code and trying to find a way to fix this problem.
But if there is somebody is familar with optimizer and already knows solution, please let me know so that I do no spend time on this.

Actually I expect that most of the logic is already present in optimizer: there are several places where it is considered where and how to evaluate tlist.
For example, there are functions use_physical_tlist/disuse_physical_tlist which triggers whether the relation targetlist or only those Vars actually needed by the query. Not sure that that them are related to this particular case...

So most likely it will be enough to insert just one function call is the proper place, but it is necessary to find the function and the place;)


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Optimizer questions

От
Simon Riggs
Дата:
On 6 January 2016 at 09:03, David Rowley <david.rowley@2ndquadrant.com> wrote:
 
It seems worthwhile to investigate how we might go about improving this so that the evaluation of the target list happens after LIMIT, at least for the columns which are not required before LIMIT.

Currently, the FunctionScan executes until it decides to stop. Then later the LIMIT is applied.

To change this, you'd need to pass down the LIMIT value into the FunctionScan node, as is already done between Limit/Sort nodes.

On the FunctionScan,  you can set the max_calls value in the FuncCallContext, but currently that isn't enforced.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Optimizer questions

От
Konstantin Knizhnik
Дата:
I can propose the following patch for LIMIT clause optimization.
With this patch create_index test is failed because of different output:

*** /home/knizhnik/postgres_cluster/src/test/regress/expected/create_index.out    2015-12-26 11:28:39.003925449 +0300
--- /home/knizhnik/postgres_cluster/src/test/regress/results/create_index.out    2016-01-07 22:28:10.559625249 +0300
***************
*** 1208,1219 ****
 
  EXPLAIN (COSTS OFF)
  SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
!                     QUERY PLAN                    
! ---------------------------------------------------
!  Limit
!    ->  Index Scan using ggcircleind on gcircle_tbl
!          Order By: (f1 <-> '(200,300)'::point)
! (3 rows)
 
  SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
   circle_center  | radius
--- 1208,1220 ----
 
  EXPLAIN (COSTS OFF)
  SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
!                        QUERY PLAN                       
! ---------------------------------------------------------
!  Result
!    ->  Limit
!          ->  Index Scan using ggcircleind on gcircle_tbl
!                Order By: (f1 <-> '(200,300)'::point)
! (4 rows)
 
  SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
   circle_center  | radius

======================================================================


But it is just good example of query when this optimization can be useful: there is no need to calculate circle_center function for all rows if we need just 10.




On 01/06/2016 12:03 PM, David Rowley wrote:
On 6 January 2016 at 13:13, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Jan 6, 2016 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
konstantin knizhnik <k.knizhnik@postgrespro.ru> writes:
> 1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed when I am altering function cost. Is it correct behavior?

The general problem of accounting for tlist eval cost is not handled very
well now, but especially not with respect to the idea that different paths
might have different tlist costs.  I'm working on an upper-planner rewrite
which should make this better, or at least make it practical to make it
better.

Hmm... Besides costing it would be nice to postpone calculation of expensive tlist functions after LIMIT.

I'd agree that it would be more than the costings that would need to be improved here.

The most simple demonstration of the problem I can think of is, if I apply the following:

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 29d92a7..2ec9822 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -641,6 +641,8 @@ int4pl(PG_FUNCTION_ARGS)

        result = arg1 + arg2;

+       elog(NOTICE, "int4pl(%d, %d)", arg1,arg2);
+
        /*
         * Overflow check.  If the inputs are of different signs then their sum
         * cannot overflow.  If the inputs are of the same sign, their sum had

Then do:

create table a (b int);
insert into a select generate_series(1,10);
select b+b as bb from a order by b limit 1;
NOTICE:  int4pl(1, 1)
NOTICE:  int4pl(2, 2)
NOTICE:  int4pl(3, 3)
NOTICE:  int4pl(4, 4)
NOTICE:  int4pl(5, 5)
NOTICE:  int4pl(6, 6)
NOTICE:  int4pl(7, 7)
NOTICE:  int4pl(8, 8)
NOTICE:  int4pl(9, 9)
NOTICE:  int4pl(10, 10)
 bb
----
  2
(1 row)

We can see that int4pl() is needlessly called 9 times. Although, I think this does only apply to queries with LIMIT. I agree that it does seem like an interesting route for optimisation.

It seems worthwhile to investigate how we might go about improving this so that the evaluation of the target list happens after LIMIT, at least for the columns which are not required before LIMIT.

Konstantin, are you thinking of looking into this more, with plans to implement code to improve this?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Optimizer questions

От
Konstantin Knizhnik
Дата:

Attached please find improved version of the optimizer patch for LIMIT clause.
Now I try to apply this optimization only for non-trivial columns requiring evaluation.
May be it will be better to take in account cost of this columns evaluation but right now I just detect non-variable columns.



On 01/06/2016 12:03 PM, David Rowley wrote:
On 6 January 2016 at 13:13, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Jan 6, 2016 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
konstantin knizhnik <k.knizhnik@postgrespro.ru> writes:
> 1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed when I am altering function cost. Is it correct behavior?

The general problem of accounting for tlist eval cost is not handled very
well now, but especially not with respect to the idea that different paths
might have different tlist costs.  I'm working on an upper-planner rewrite
which should make this better, or at least make it practical to make it
better.

Hmm... Besides costing it would be nice to postpone calculation of expensive tlist functions after LIMIT.

I'd agree that it would be more than the costings that would need to be improved here.

The most simple demonstration of the problem I can think of is, if I apply the following:

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 29d92a7..2ec9822 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -641,6 +641,8 @@ int4pl(PG_FUNCTION_ARGS)

        result = arg1 + arg2;

+       elog(NOTICE, "int4pl(%d, %d)", arg1,arg2);
+
        /*
         * Overflow check.  If the inputs are of different signs then their sum
         * cannot overflow.  If the inputs are of the same sign, their sum had

Then do:

create table a (b int);
insert into a select generate_series(1,10);
select b+b as bb from a order by b limit 1;
NOTICE:  int4pl(1, 1)
NOTICE:  int4pl(2, 2)
NOTICE:  int4pl(3, 3)
NOTICE:  int4pl(4, 4)
NOTICE:  int4pl(5, 5)
NOTICE:  int4pl(6, 6)
NOTICE:  int4pl(7, 7)
NOTICE:  int4pl(8, 8)
NOTICE:  int4pl(9, 9)
NOTICE:  int4pl(10, 10)
 bb
----
  2
(1 row)

We can see that int4pl() is needlessly called 9 times. Although, I think this does only apply to queries with LIMIT. I agree that it does seem like an interesting route for optimisation.

It seems worthwhile to investigate how we might go about improving this so that the evaluation of the target list happens after LIMIT, at least for the columns which are not required before LIMIT.

Konstantin, are you thinking of looking into this more, with plans to implement code to improve this?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

Re: Optimizer questions

От
Bruce Momjian
Дата:
On Tue, Jan  5, 2016 at 05:55:28PM +0300, konstantin knizhnik wrote:
> Hi hackers, 
> 
> I want to ask two questions about PostgreSQL optimizer.
> I have the following query:
> 
> SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id
> as dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name
> ,o.title,o.size,o.deleted,la.otime,la.etime,uo.login as owner_login,uc.login as
> creator_login,(CASE WHEN f.user_id IS NULL THEN 0 ELSE 1 END) AS flagged,
> (select 'userid\\:'||string_agg(user_id,' userid\\:') from get_authorized_users
> (o.id)) as acl FROM objects s JOIN objects o ON s.original_file=o.id LEFT JOIN
> flags f ON o.id=f.obj_id AND o.owner=f.user_id LEFT JOIN objects_last_activity
> la ON o.id = la.obj_id AND o.owner = la.user_id, mime m, users uc , users uo
> WHERE (s.mime_id=904 or s.mime_id=908) AND m.mime_id = o.mime_id AND o.owner =
> uo.user_id AND o.creator = uc.user_id ORDER BY s.mtime LIMIT 9;

FYI, I could not make any sense out of this query, and I frankly can't
figure out how others can udnerstand it.  :-O   Anyway, I ran it through
pgFormatter (https://github.com/darold/pgFormatter), which I am showing
here because I was impressed with the results:
SELECT    o.id AS id,    s.id AS sid,    o.owner,    o.creator,    o.parent_id AS dir_id,    s.mime_id,    m.c_type,
s.p_file,   s.mtime,    o.ctime,    o.name,    o.title,    o.size,    o.deleted,    la.otime,    la.etime,    uo.login
ASowner_login,    uc.login AS creator_login,    (        CASE            WHEN f.user_id IS NULL THEN 0            ELSE
1       END ) AS flagged,    (        SELECT            'userid\\:' || string_agg (                user_id,
  ' userid\\:' )        FROM            get_authorized_users (                o.id ) ) AS aclFROM    objects s    JOIN
objectso ON s.original_file = o.id    LEFT JOIN flags f ON o.id = f.obj_id    AND o.owner = f.user_id    LEFT JOIN
objects_last_activityla ON o.id = la.obj_id    AND o.owner = la.user_id,    mime m,    users uc,    users uoWHERE (
s.mime_id= 904    OR s.mime_id = 908 )    AND m.mime_id = o.mime_id    AND o.owner = uo.user_id    AND o.creator =
uc.user_idORDERBY    s.mtimeLIMIT 9;
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: Optimizer questions

От
Gilles Darold
Дата:
Le 18/01/2016 03:47, Bruce Momjian a écrit :
> On Tue, Jan  5, 2016 at 05:55:28PM +0300, konstantin knizhnik wrote:
>> Hi hackers, 
>>
>> I want to ask two questions about PostgreSQL optimizer.
>> I have the following query:
>>
>> SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id
>> as dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name
>> ,o.title,o.size,o.deleted,la.otime,la.etime,uo.login as owner_login,uc.login as
>> creator_login,(CASE WHEN f.user_id IS NULL THEN 0 ELSE 1 END) AS flagged,
>> (select 'userid\\:'||string_agg(user_id,' userid\\:') from get_authorized_users
>> (o.id)) as acl FROM objects s JOIN objects o ON s.original_file=o.id LEFT JOIN
>> flags f ON o.id=f.obj_id AND o.owner=f.user_id LEFT JOIN objects_last_activity
>> la ON o.id = la.obj_id AND o.owner = la.user_id, mime m, users uc , users uo
>> WHERE (s.mime_id=904 or s.mime_id=908) AND m.mime_id = o.mime_id AND o.owner =
>> uo.user_id AND o.creator = uc.user_id ORDER BY s.mtime LIMIT 9;
> FYI, I could not make any sense out of this query, and I frankly can't
> figure out how others can udnerstand it.  :-O   Anyway, I ran it through
> pgFormatter (https://github.com/darold/pgFormatter), which I am showing
> here because I was impressed with the results:
>
>     SELECT
>         o.id AS id,
>         s.id AS sid,
>         o.owner,
>         o.creator,
>         o.parent_id AS dir_id,
>         s.mime_id,
>         m.c_type,
>         s.p_file,
>         s.mtime,
>         o.ctime,
>         o.name,
>         o.title,
>         o.size,
>         o.deleted,
>         la.otime,
>         la.etime,
>         uo.login AS owner_login,
>         uc.login AS creator_login,
>         (
>             CASE
>                 WHEN f.user_id IS NULL THEN 0
>                 ELSE 1
>             END ) AS flagged,
>         (
>             SELECT
>                 'userid\\:' || string_agg (
>                     user_id,
>                     ' userid\\:' )
>             FROM
>                 get_authorized_users (
>                     o.id ) ) AS acl
>     FROM
>         objects s
>         JOIN objects o ON s.original_file = o.id
>         LEFT JOIN flags f ON o.id = f.obj_id
>         AND o.owner = f.user_id
>         LEFT JOIN objects_last_activity la ON o.id = la.obj_id
>         AND o.owner = la.user_id,
>         mime m,
>         users uc,
>         users uo
>     WHERE (
>         s.mime_id = 904
>         OR s.mime_id = 908 )
>         AND m.mime_id = o.mime_id
>         AND o.owner = uo.user_id
>         AND o.creator = uc.user_id
>     ORDER BY
>         s.mtime
>     LIMIT 9;
>


Thanks Bruce for the pointer on this tool, even if it is not perfect I
think it can be useful. There's also an on-line version at
http://sqlformat.darold.net/ that every one can use without having to
install it and to format queries up to 20Kb with option to control the
output format. It can also anonymize SQL queries.

Actually this is the SQL formatter/beautifier used in pgBadger, it has
been extracted as an independent project to be able to improve this part
of pgBadger without having to run it each time.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org




Re: Optimizer questions

От
Konstantin Knizhnik
Дата:
I am sorry for badly formatted query - I just cut&paste it from C++ 
client program.

I have one more question to community regarding this patch.
Proposed patch fix the problem particularly for SORT+LIMIT clauses.
In this case evaluation of expressions which are not used in sort is 
alway waste of time.
But I wonder if we should delay evaluation of complex expressions even 
if there is no LIMIT?
Quite often client application doesn't fetch all query results even if 
there is no LIMIT clause.



On 18.01.2016 05:47, Bruce Momjian wrote:
> On Tue, Jan  5, 2016 at 05:55:28PM +0300, konstantin knizhnik wrote:
>> Hi hackers,
>>
>> I want to ask two questions about PostgreSQL optimizer.
>> I have the following query:
>>
>> SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id
>> as dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name
>> ,o.title,o.size,o.deleted,la.otime,la.etime,uo.login as owner_login,uc.login as
>> creator_login,(CASE WHEN f.user_id IS NULL THEN 0 ELSE 1 END) AS flagged,
>> (select 'userid\\:'||string_agg(user_id,' userid\\:') from get_authorized_users
>> (o.id)) as acl FROM objects s JOIN objects o ON s.original_file=o.id LEFT JOIN
>> flags f ON o.id=f.obj_id AND o.owner=f.user_id LEFT JOIN objects_last_activity
>> la ON o.id = la.obj_id AND o.owner = la.user_id, mime m, users uc , users uo
>> WHERE (s.mime_id=904 or s.mime_id=908) AND m.mime_id = o.mime_id AND o.owner =
>> uo.user_id AND o.creator = uc.user_id ORDER BY s.mtime LIMIT 9;
> FYI, I could not make any sense out of this query, and I frankly can't
> figure out how others can udnerstand it.  :-O   Anyway, I ran it through
> pgFormatter (https://github.com/darold/pgFormatter), which I am showing
> here because I was impressed with the results:
>
>     SELECT
>         o.id AS id,
>         s.id AS sid,
>         o.owner,
>         o.creator,
>         o.parent_id AS dir_id,
>         s.mime_id,
>         m.c_type,
>         s.p_file,
>         s.mtime,
>         o.ctime,
>         o.name,
>         o.title,
>         o.size,
>         o.deleted,
>         la.otime,
>         la.etime,
>         uo.login AS owner_login,
>         uc.login AS creator_login,
>         (
>             CASE
>                 WHEN f.user_id IS NULL THEN 0
>                 ELSE 1
>             END ) AS flagged,
>         (
>             SELECT
>                 'userid\\:' || string_agg (
>                     user_id,
>                     ' userid\\:' )
>             FROM
>                 get_authorized_users (
>                     o.id ) ) AS acl
>     FROM
>         objects s
>         JOIN objects o ON s.original_file = o.id
>         LEFT JOIN flags f ON o.id = f.obj_id
>         AND o.owner = f.user_id
>         LEFT JOIN objects_last_activity la ON o.id = la.obj_id
>         AND o.owner = la.user_id,
>         mime m,
>         users uc,
>         users uo
>     WHERE (
>         s.mime_id = 904
>         OR s.mime_id = 908 )
>         AND m.mime_id = o.mime_id
>         AND o.owner = uo.user_id
>         AND o.creator = uc.user_id
>     ORDER BY
>         s.mtime
>     LIMIT 9;
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Optimizer questions

От
Alexander Korotkov
Дата:
On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Attached please find improved version of the optimizer patch for LIMIT clause.
Now I try to apply this optimization only for non-trivial columns requiring evaluation.
May be it will be better to take in account cost of this columns evaluation but right now I just detect non-variable columns.

We may think about more general feature: LIMIT pushdown. In the Konstantin's patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial. For instance, we can do LIMIT before JOINs. That is possible only for LEFT JOIN which is not used in filter and ordering clauses. See the example below.

# create table t1 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# create table t2 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by t1.v limit 10;
                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=87421.64..87421.67 rows=10 width=24) (actual time=1486.276..1486.278 rows=10 loops=1)
   ->  Sort  (cost=87421.64..89921.64 rows=1000000 width=24) (actual time=1486.275..1486.275 rows=10 loops=1)
         Sort Key: t1.v
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Hash Left Join  (cost=27906.00..65812.00 rows=1000000 width=24) (actual time=226.180..1366.238 rows=1000000
               Hash Cond: (t1.i = t2.i)
               ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.010..77.041 rows=1000000 l
               ->  Hash  (cost=15406.00..15406.00 rows=1000000 width=12) (actual time=226.066..226.066 rows=1000000 loop
                     Buckets: 131072  Batches: 1  Memory Usage: 46875kB
                     ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.007..89.002 rows=100
 Planning time: 0.123 ms
 Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10) t1 left join t2 on t1.i = t2.i;
                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=37015.89..56171.99 rows=10 width=24) (actual time=198.478..301.278 rows=10 loops=1)
   Hash Cond: (t2.i = t1.i)
   ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.005..74.303 rows=1000000 loops=1)
   ->  Hash  (cost=37015.77..37015.77 rows=10 width=12) (actual time=153.584..153.584 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Limit  (cost=37015.64..37015.67 rows=10 width=12) (actual time=153.579..153.580 rows=10 loops=1)
               ->  Sort  (cost=37015.64..39515.64 rows=1000000 width=12) (actual time=153.578..153.578 rows=10 loops=1)
                     Sort Key: t1.v
                     Sort Method: top-N heapsort  Memory: 25kB
                     ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.012..78.828 rows=100
 Planning time: 0.132 ms
 Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very nice if optimizer make this automatically.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Optimizer questions

От
Konstantin Knizhnik
Дата:
Unfortunately this two statements are not equivalent: second one can (in theory, but not for this particular data set) return more than 10 result records.
Such optimization will be correct if t2.i is declared as unique.

But the most efficient plan for this query will be generated if there is index for t1.v.
In this case no explicit sot is needed. Limit is still not pushed down, but it is not a problem because nested join is used which is not materializing its result
(produces records on demand):

# explain analyze select * from t1 left outer join t2 on t1.k=t2.k order by t1.v limit 10;
                                                          QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.58..4.38 rows=10 width=16) (actual time=0.069..0.157 rows=10 loops=1)
   ->  Nested Loop Left Join  (cost=0.58..37926.63 rows=100001 width=16) (actual time=0.067..0.154 rows=10 loops=1)
         ->  Index Scan using idxv on t1  (cost=0.29..3050.31 rows=100001 width=8) (actual time=0.046..0.053 rows=10 loops=1)
         ->  Index Scan using t2_pkey on t2  (cost=0.29..0.34 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=10)
               Index Cond: (t1.k = k)
 Planning time: 0.537 ms
 Execution time: 0.241 ms
(7 rows)


On 01/30/2016 01:01 AM, Alexander Korotkov wrote:
On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Attached please find improved version of the optimizer patch for LIMIT clause.
Now I try to apply this optimization only for non-trivial columns requiring evaluation.
May be it will be better to take in account cost of this columns evaluation but right now I just detect non-variable columns.

We may think about more general feature: LIMIT pushdown. In the Konstantin's patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial. For instance, we can do LIMIT before JOINs. That is possible only for LEFT JOIN which is not used in filter and ordering clauses. See the example below.

# create table t1 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# create table t2 as (select i, random() v from generate_series(1,1000000) i);
SELECT 1000000

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by t1.v limit 10;
                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=87421.64..87421.67 rows=10 width=24) (actual time=1486.276..1486.278 rows=10 loops=1)
   ->  Sort  (cost=87421.64..89921.64 rows=1000000 width=24) (actual time=1486.275..1486.275 rows=10 loops=1)
         Sort Key: t1.v
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Hash Left Join  (cost=27906.00..65812.00 rows=1000000 width=24) (actual time=226.180..1366.238 rows=1000000
               Hash Cond: (t1.i = t2.i)
               ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.010..77.041 rows=1000000 l
               ->  Hash  (cost=15406.00..15406.00 rows=1000000 width=12) (actual time=226.066..226.066 rows=1000000 loop
                     Buckets: 131072  Batches: 1  Memory Usage: 46875kB
                     ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.007..89.002 rows=100
 Planning time: 0.123 ms
 Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10) t1 left join t2 on t1.i = t2.i;
                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=37015.89..56171.99 rows=10 width=24) (actual time=198.478..301.278 rows=10 loops=1)
   Hash Cond: (t2.i = t1.i)
   ->  Seq Scan on t2  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.005..74.303 rows=1000000 loops=1)
   ->  Hash  (cost=37015.77..37015.77 rows=10 width=12) (actual time=153.584..153.584 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Limit  (cost=37015.64..37015.67 rows=10 width=12) (actual time=153.579..153.580 rows=10 loops=1)
               ->  Sort  (cost=37015.64..39515.64 rows=1000000 width=12) (actual time=153.578..153.578 rows=10 loops=1)
                     Sort Key: t1.v
                     Sort Method: top-N heapsort  Memory: 25kB
                     ->  Seq Scan on t1  (cost=0.00..15406.00 rows=1000000 width=12) (actual time=0.012..78.828 rows=100
 Planning time: 0.132 ms
 Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very nice if optimizer make this automatically.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Optimizer questions

От
Tom Lane
Дата:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> Attached please find improved version of the optimizer patch for LIMIT clause.

This patch isn't anywhere close to working after 3fc6e2d7f5b652b4.
(TBH, the reason I was negative about this upthread is that I had that
one in the oven and knew it would conflict spectacularly.)  I encourage
you to think about how an optimization of this sort could be made to
work in a non-kluge fashion in the new code structure.

I've not spent a lot of time on this, but I think maybe what would make
sense is to consider both the case where function calculations are
postponed to after ORDER BY and the case where they aren't, and generate
Paths for both.  Neither approach is a slam-dunk win.  For example,
suppose that one of the tlist columns is md5(wide_column) --- it will
likely not be preferable to pass the wide column data through the sort
step rather than reducing it to a hash first.  This would require some
work in grouping_planner to track two possible pathtargets, and work in
create_ordered_paths to generate paths for both possibilities.  A possible
objection is that this would add planning work even when no real benefit
is possible; so maybe we should only consider the new way if the tlist has
significant eval cost?  Not sure about that.  There is also something
to be said for the idea that we should try to guarantee consistent
semantics when the tlist contains volatile functions.

For now, I've set this commitfest entry to Waiting on Author.  There's
still time to consider a rewrite in this 'fest, if you can get it done
in a week or two.
        regards, tom lane



Re: Optimizer questions

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> > Attached please find improved version of the optimizer patch for LIMIT clause.

> For now, I've set this commitfest entry to Waiting on Author.  There's
> still time to consider a rewrite in this 'fest, if you can get it done
> in a week or two.

Yes.  Given that Konstantin will have to struggle to get this patch
rebased on top of upper-planner pathification which appeared out of the
blue at the last minute, it seems fair to give some additional time
for the required work.

However, we still have a commitfest schedule to adhere to, and
Konstantin has two other patches in the commitfest:
* Support ALTER INDEX ... WHERE ... clause
* eXtensible Transaction Manager API (v2)

and since we also need his contribution as a patch reviewer, it seems
unfair to just let all his patches move forward --- if we did that, he
would have no time at all to review other's patches, which is a
requirement.

Since we're only one week into the commitfest, I think it's his
prerogative to decide what to do.  I think there are two options: he can
either continue with this patch only, and get back from WoA to
Needs-Review in (hopefully) one week; or he can drop this one from the
commitfest right now and concentrate on the two other ones.  Either way,
as I already stated, we need his contribution as a reviewer for other
patche, too.

(If I were in his socks, I wouldn't have any hope that the XTM patch
would go in for 9.6 at this point; the most I'd hope is to have lots of
feedback in order to have something to propose for early 9.7.  I don't
know the status of the ALTER INDEX one, so I can't comment there.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Optimizer questions

От
Konstantin Knizhnik
Дата:
On 03/08/2016 07:01 AM, Tom Lane wrote:
> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>> Attached please find improved version of the optimizer patch for LIMIT clause.
> This patch isn't anywhere close to working after 3fc6e2d7f5b652b4.
> (TBH, the reason I was negative about this upthread is that I had that
> one in the oven and knew it would conflict spectacularly.)  I encourage
> you to think about how an optimization of this sort could be made to
> work in a non-kluge fashion in the new code structure.
>
> I've not spent a lot of time on this, but I think maybe what would make
> sense is to consider both the case where function calculations are
> postponed to after ORDER BY and the case where they aren't, and generate
> Paths for both.  Neither approach is a slam-dunk win.  For example,
> suppose that one of the tlist columns is md5(wide_column) --- it will
> likely not be preferable to pass the wide column data through the sort
> step rather than reducing it to a hash first.  This would require some
> work in grouping_planner to track two possible pathtargets, and work in
> create_ordered_paths to generate paths for both possibilities.  A possible
> objection is that this would add planning work even when no real benefit
> is possible; so maybe we should only consider the new way if the tlist has
> significant eval cost?  Not sure about that.  There is also something
> to be said for the idea that we should try to guarantee consistent
> semantics when the tlist contains volatile functions.
>
> For now, I've set this commitfest entry to Waiting on Author.  There's
> still time to consider a rewrite in this 'fest, if you can get it done
> in a week or two.
>
>             regards, tom lane

Attached please find rebased patch.
Unfortunately 3fc6e2d7f5b652b4 still doesn't fix the problem with "lazy" evaluation of target list.
This is why my patch is still useful. But frankly speaking I am not sure that it is best way of fixing this problem,
because it takes in account only one case: sort+limit. May be the same optimization can be useful for other queries.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: Optimizer questions

От
Tom Lane
Дата:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> On 03/08/2016 07:01 AM, Tom Lane wrote:
>> I've not spent a lot of time on this, but I think maybe what would make
>> sense is to consider both the case where function calculations are
>> postponed to after ORDER BY and the case where they aren't, and generate
>> Paths for both.  Neither approach is a slam-dunk win.  For example,
>> suppose that one of the tlist columns is md5(wide_column) --- it will
>> likely not be preferable to pass the wide column data through the sort
>> step rather than reducing it to a hash first.  This would require some
>> work in grouping_planner to track two possible pathtargets, and work in
>> create_ordered_paths to generate paths for both possibilities.  A possible
>> objection is that this would add planning work even when no real benefit
>> is possible; so maybe we should only consider the new way if the tlist has
>> significant eval cost?  Not sure about that.  There is also something
>> to be said for the idea that we should try to guarantee consistent
>> semantics when the tlist contains volatile functions.

> Attached please find rebased patch.

This may be rebased, but it doesn't seem to respond to any of my concerns
above.  In particular, if we're going to change behavior in this area,
I think we need to try to ensure that volatile functions in the tlist will
see consistent behavior no matter which plan shape is chosen.  People may
well be depending on the existing behavior for particular queries.  If
we're going to break their queries, I'd at least like to be able to say
that there's now a predictable, well-defined semantics.  Something like
"volatile functions in the tlist that are not in sort/group columns are
certain to be evaluated after sorting occurs, if there is an ORDER BY".
This should not be dependent on whether there's a LIMIT, nor GROUP BY
nor windowing functions, nor on random other whims of the optimizer.
So I'm not at all happy with a patch that changes things only for the
LIMIT-but-no-grouping-or-windows case.

I'm not sure whether it's worth pursuing cost-based comparisons for
functions that aren't volatile.  In an ideal world we could deal with the
md5() example I gave, but as things stand I suspect we usually have too
little idea whether the result of f(x) is wider or narrower than x itself.
(One thing we do know is that f's output won't be toasted, whereas x might
be; so it might be a bad idea to bet on function results getting
narrower.)  So I'm afraid it's going to be really hard to tell whether
we're making the sort step itself more or less expensive if we postpone
function evals.  But we do know for certain that we're adding a projection
step that wasn't there before when we postpone functions to after the
Sort.  That kind of suggests that there should be some minimum estimated
cost of the functions before we add that extra step (unless they are
volatile of course).  Or only do it if there is a LIMIT or we have a
tuple_fraction suggesting that partial eval of the query is of interest.

BTW, there's some additional refactoring I had had in mind to do in
grouping_planner to make its handling of the targetlist a bit more
organized; in particular, I'd like to see it using PathTarget
representation more consistently throughout the post-scan-join steps.
I had figured that was just neatnik-ism and could be done later,
but we may need to do it now in order to be able to deal with these
considerations in a cleaner fashion.  I really don't like the way
that this patch hacks up the behavior of make_scanjoin_target() without
even a comment explaining its new API.

The rough sketch of what I'd had in mind is that we should convert the
processed, final tlist into PathTarget form immediately after
query_planner, since we know we're going to need that eventually anyway.
Then, if we need to do grouping/aggregation or windowing, derive modified
PathTargets that we want to use for the inputs of those steps.  (This'd
require some infrastructure that doesn't currently exist for manipulating
PathTargets, particularly the ability to copy a PathTarget and/or add a
column to an existing PathTarget.)

The way this optimization would fit into that structure is that the
DISTINCT/ORDER BY steps would now also have a desired input pathtarget
that might be different from the final target.
        regards, tom lane



Re: Optimizer questions

От
Tom Lane
Дата:
I wrote:
> BTW, there's some additional refactoring I had had in mind to do in
> grouping_planner to make its handling of the targetlist a bit more
> organized; in particular, I'd like to see it using PathTarget
> representation more consistently throughout the post-scan-join steps.

See 51c0f63e4d76a86b44e87876a6addcfffb01ec28 --- I think this gets
things to where we could plug in additional levels of targets without
too much complication.
        regards, tom lane



Re: Optimizer questions

От
Konstantin Knizhnik
Дата:

On 09.03.2016 09:15, Tom Lane wrote:
> I wrote:
>> BTW, there's some additional refactoring I had had in mind to do in
>> grouping_planner to make its handling of the targetlist a bit more
>> organized; in particular, I'd like to see it using PathTarget
>> representation more consistently throughout the post-scan-join steps.
> See 51c0f63e4d76a86b44e87876a6addcfffb01ec28 --- I think this gets
> things to where we could plug in additional levels of targets without
> too much complication.
>
>             regards, tom lane

So, if I correctly understand you, there are two major concerns:

1. Volatile functions. I wonder if it is really required to reevaluate 
volatile function for each record even if LIMIT clause is present?
Documentation says:
"A query using a volatile function will re-evaluate the function at 
every row where its value is needed."
So if we are using sort with limit and value of function is not used in 
sort, then we it is correct to say that value of this function is no 
needed, so there is no need to re-evaluate it, isn't it?

2. Narrowing functions, like md5.
Here I do not have any good idea how to support it. Looks like cost of 
SORT should depend on tuple width. Only in this case optimizer can 
determine whether it is more efficient to evaluate function earlier or 
postpone its execution.

I think that the best approach is to generate two different paths: 
original one, when projection is always done before sort and another one 
with postponed projection of non-trivial columns. Then we compare costs 
of two paths and choose the best one.
Unfortunately, I do not understand now how to implement it with existed 
grouping_planner.
Do you think that it is possible?

Alternative approach is to do something like in my proposed patch, but 
take in account cost of function execution and check presence of 
volatile/narrowing functions. This approach provides better flexibility, 
because we can choose subset of columns not-used in sort, which 
evaluation should be postponed. But here we once again make local 
decision while construction of the path instead of comparing costs of 
full paths.




-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Optimizer questions

От
Tom Lane
Дата:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> I think that the best approach is to generate two different paths:
> original one, when projection is always done before sort and another one
> with postponed projection of non-trivial columns. Then we compare costs
> of two paths and choose the best one.
> Unfortunately, I do not understand now how to implement it with existed
> grouping_planner.
> Do you think that it is possible?

After fooling with this awhile, I don't think it's actually necessary
to do that.  See attached proof-of-concept patch.

Although this patch gets through our regression tests, that's only because
it's conservative about deciding to postpone evaluation; if it tried to
postpone evaluation in a query with window functions, it would fail
miserably, because pull_var_clause doesn't know about window functions.
I think that that's a clear oversight and we should extend it to offer
the same sorts of behaviors as it does for Aggrefs.  But that would be
slightly invasive, there being a dozen or so callers; so I didn't bother
to do it yet pending comments on this patch.

I think it's probably also broken for SRFs in the tlist; we need to work
out what semantics we want for those.  If we postpone any SRF to after
the Sort, we can no longer assume that a query LIMIT enables use of
bounded sort (because the SRF might repeatedly return zero rows).
I don't have a huge problem with that, but I think now would be a good
time to nail down some semantics.

Some other work that would be useful would be to refactor so that the
cost_qual_eval operations aren't so redundant.  But that's just a small
time savings not a question of functionality.

And we'd have to document that this changes the behavior for volatile
functions.  For the better, I think: this will mean that you get
consistent results no matter whether the query is implemented by
indexscan or seqscan-and-sort, which has never been true before.
But it is a change.

Do people approve of this sort of change in general, or this patch
approach in particular?  Want to bikeshed the specific
when-to-postpone-eval policies implemented here?  Other comments?

            regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8937e71..b15fca1 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** static RelOptInfo *create_distinct_paths
*** 126,131 ****
--- 126,132 ----
                        RelOptInfo *input_rel);
  static RelOptInfo *create_ordered_paths(PlannerInfo *root,
                       RelOptInfo *input_rel,
+                      PathTarget *target,
                       double limit_tuples);
  static PathTarget *make_group_input_target(PlannerInfo *root, List *tlist);
  static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
*************** static PathTarget *make_window_input_tar
*** 134,139 ****
--- 135,142 ----
                           List *tlist, List *activeWindows);
  static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
                           List *tlist);
+ static PathTarget *make_sort_input_target(PlannerInfo *root,
+                        PathTarget *final_target);


  /*****************************************************************************
*************** grouping_planner(PlannerInfo *root, bool
*** 1378,1383 ****
--- 1381,1387 ----
      int64        offset_est = 0;
      int64        count_est = 0;
      double        limit_tuples = -1.0;
+     PathTarget *final_target;
      RelOptInfo *current_rel;
      RelOptInfo *final_rel;
      ListCell   *lc;
*************** grouping_planner(PlannerInfo *root, bool
*** 1437,1442 ****
--- 1441,1449 ----
          /* Save aside the final decorated tlist */
          root->processed_tlist = tlist;

+         /* Also extract the PathTarget form of the setop result tlist */
+         final_target = current_rel->cheapest_total_path->pathtarget;
+
          /*
           * Can't handle FOR [KEY] UPDATE/SHARE here (parser should have
           * checked already, but let's make sure).
*************** grouping_planner(PlannerInfo *root, bool
*** 1461,1467 ****
      else
      {
          /* No set operations, do regular planning */
!         PathTarget *final_target;
          PathTarget *grouping_target;
          PathTarget *scanjoin_target;
          bool        have_grouping;
--- 1468,1474 ----
      else
      {
          /* No set operations, do regular planning */
!         PathTarget *sort_input_target;
          PathTarget *grouping_target;
          PathTarget *scanjoin_target;
          bool        have_grouping;
*************** grouping_planner(PlannerInfo *root, bool
*** 1657,1678 ****
          final_target = create_pathtarget(root, tlist);

          /*
           * If we have window functions to deal with, the output from any
           * grouping step needs to be what the window functions want;
!          * otherwise, it should just be final_target.
           */
          if (activeWindows)
              grouping_target = make_window_input_target(root,
                                                         tlist,
                                                         activeWindows);
          else
!             grouping_target = final_target;

          /*
           * If we have grouping or aggregation to do, the topmost scan/join
!          * plan node must emit what the grouping step wants; otherwise, if
!          * there's window functions, it must emit what the window functions
!          * want; otherwise, it should emit final_target.
           */
          have_grouping = (parse->groupClause || parse->groupingSets ||
                           parse->hasAggs || root->hasHavingQual);
--- 1664,1694 ----
          final_target = create_pathtarget(root, tlist);

          /*
+          * If ORDER BY was given, consider whether we should use a post-sort
+          * projection, and compute the adjusted target for preceding steps if
+          * so.
+          */
+         if (parse->sortClause)
+             sort_input_target = make_sort_input_target(root, final_target);
+         else
+             sort_input_target = final_target;
+
+         /*
           * If we have window functions to deal with, the output from any
           * grouping step needs to be what the window functions want;
!          * otherwise, it should be sort_input_target.
           */
          if (activeWindows)
              grouping_target = make_window_input_target(root,
                                                         tlist,
                                                         activeWindows);
          else
!             grouping_target = sort_input_target;

          /*
           * If we have grouping or aggregation to do, the topmost scan/join
!          * plan node must emit what the grouping step wants; otherwise, it
!          * should emit grouping_target.
           */
          have_grouping = (parse->groupClause || parse->groupingSets ||
                           parse->hasAggs || root->hasHavingQual);
*************** grouping_planner(PlannerInfo *root, bool
*** 1733,1739 ****
              current_rel = create_window_paths(root,
                                                current_rel,
                                                grouping_target,
!                                               final_target,
                                                tlist,
                                                wflists,
                                                activeWindows);
--- 1749,1755 ----
              current_rel = create_window_paths(root,
                                                current_rel,
                                                grouping_target,
!                                               sort_input_target,
                                                tlist,
                                                wflists,
                                                activeWindows);
*************** grouping_planner(PlannerInfo *root, bool
*** 1793,1798 ****
--- 1809,1815 ----
      {
          current_rel = create_ordered_paths(root,
                                             current_rel,
+                                            final_target,
                                             limit_tuples);
      }

*************** create_distinct_paths(PlannerInfo *root,
*** 3704,3715 ****
--- 3721,3734 ----
   * cheapest-total existing path.
   *
   * input_rel: contains the source-data Paths
+  * target: the output tlist the result Paths must emit
   * limit_tuples: estimated bound on the number of output tuples,
   *        or -1 if no LIMIT or couldn't estimate
   */
  static RelOptInfo *
  create_ordered_paths(PlannerInfo *root,
                       RelOptInfo *input_rel,
+                      PathTarget *target,
                       double limit_tuples)
  {
      Path       *cheapest_input_path = input_rel->cheapest_total_path;
*************** create_ordered_paths(PlannerInfo *root,
*** 3737,3742 ****
--- 3756,3767 ----
                                                   root->sort_pathkeys,
                                                   limit_tuples);
              }
+
+             /* Add projection step if needed */
+             if (path->pathtarget != target)
+                 path = apply_projection_to_path(root, ordered_rel,
+                                                 path, target);
+
              add_path(ordered_rel, path);
          }
      }
*************** make_pathkeys_for_window(PlannerInfo *ro
*** 4140,4145 ****
--- 4165,4357 ----
  }

  /*
+  * make_sort_input_target
+  *      Generate appropriate PathTarget for initial input to Sort step.
+  *
+  * If the query has ORDER BY, this function chooses the target list to be
+  * computed by the node just below the Sort (and Distinct, if any) steps.
+  * This might or might not be identical to the query's final output tlist.
+  *
+  * The main argument for keeping the sort-input tlist the same as the final
+  * is that we avoid a separate projection node (which will be needed if
+  * they're different, because Sort can't project).  However, there are also
+  * advantages to postponing tlist evaluation till after the Sort: it ensures
+  * a consistent order of evaluation for any volatile functions in the tlist,
+  * and if there's also a LIMIT, we can stop the query without ever computing
+  * tlist functions for later rows, which is beneficial for both volatile and
+  * expensive functions.
+  *
+  * Our current policy is to postpone volatile expressions till after the sort
+  * unconditionally (assuming that that's possible, ie they are in plain tlist
+  * columns and not ORDER/GROUP BY/DISTINCT columns).  Expensive expressions
+  * are postponed if there is a LIMIT, or if root->tuple_fraction shows that
+  * partial evaluation of the query is possible (if neither is true, we expect
+  * to have to evaluate the expressions for every row anyway), or if there are
+  * any volatile expressions (since once we've put in a projection at all,
+  * it won't cost any more to postpone more stuff).
+  *
+  * Another issue that could potentially be considered here is that
+  * evaluating tlist expressions could result in data that's either wider
+  * or narrower than the input Vars, thus changing the volume of data that
+  * has to go through the Sort.  However, we usually have only a very bad
+  * idea of the output width of any expression more complex than a Var,
+  * so for now it seems too risky to try to optimize on that basis.
+  *
+  * Note that if we do produce a modified sort-input target, and then the
+  * query ends up not using an explicit Sort, no particular harm is done:
+  * we'll initially use the modified target for the preceding path nodes,
+  * but then change them to the final target with apply_projection_to_path.
+  * Moreover, in such a case the guarantees about evaluation order of
+  * volatile functions still hold, since the rows are sorted already.
+  *
+  * This function has some things in common with make_group_input_target and
+  * make_window_input_target, though the detailed rules for what to do are
+  * different.  We never flatten/postpone any grouping or ordering columns;
+  * those are needed before the sort.  If we do flatten a particular
+  * expression, we leave Aggref and WindowFunc nodes alone, since those were
+  * computed earlier.
+  *
+  * 'final_target' is the query's final target list (in PathTarget form)
+  *
+  * The result is the PathTarget to be computed by the plan node immediately
+  * below the Sort step (and the Distinct step, if any).  This will be
+  * exactly final_target if we decide a projection step wouldn't be helpful.
+  */
+ static PathTarget *
+ make_sort_input_target(PlannerInfo *root,
+                        PathTarget *final_target)
+ {
+     Query       *parse = root->parse;
+     PathTarget *input_target;
+     int            ncols;
+     bool       *postpone_col;
+     bool        have_volatile;
+     bool        have_expensive;
+     List       *flattenable_cols;
+     List       *flattenable_vars;
+     int            i;
+     ListCell   *lc;
+
+     /* Shouldn't get here unless query has ORDER BY */
+     Assert(parse->sortClause);
+
+     /* Inspect tlist and collect per-column information */
+     ncols = list_length(final_target->exprs);
+     postpone_col = (bool *) palloc0(ncols * sizeof(bool));
+     have_volatile = have_expensive = false;
+
+     i = 0;
+     foreach(lc, final_target->exprs)
+     {
+         Expr       *expr = (Expr *) lfirst(lc);
+
+         /*
+          * If the column has a sortgroupref, assume it has to be evaluated
+          * before sorting.  Generally such columns would be ORDER BY, GROUP
+          * BY, etc targets.  One exception is columns that were removed from
+          * GROUP BY by remove_useless_groupby_columns() ... but those would
+          * only be Vars anyway.
+          */
+         if (final_target->sortgrouprefs[i] == 0)
+         {
+             /*
+              * If it's volatile, that's an unconditional reason to postpone.
+              */
+             if (contain_volatile_functions((Node *) expr))
+             {
+                 postpone_col[i] = true;
+                 have_volatile = true;
+             }
+             else
+             {
+                 /*
+                  * Else check the cost.  XXX it's annoying to have to do this
+                  * when set_pathtarget_cost_width() just did it.  Refactor to
+                  * allow sharing the work?
+                  */
+                 QualCost    cost;
+
+                 cost_qual_eval_node(&cost, (Node *) expr, root);
+
+                 /*
+                  * We arbitrarily define "expensive" as "more than 10X
+                  * cpu_operator_cost".  Note this will take in any PL function
+                  * with default cost.
+                  */
+                 if (cost.per_tuple >= 10 * cpu_operator_cost)
+                 {
+                     postpone_col[i] = true;
+                     have_expensive = true;
+                 }
+             }
+         }
+         i++;
+     }
+
+     /*
+      * If we don't need a post-sort projection, just return final_target.
+      */
+     if (!(have_volatile ||
+           (have_expensive &&
+            (parse->limitCount || root->tuple_fraction > 0))))
+         return final_target;
+
+     /*
+      * Otherwise, construct the sort-input target, taking all non-postponable
+      * columns and then adding Vars, PlaceHolderVars, Aggrefs, and WindowFuncs
+      * found in the postponable ones.  XXX define a create_empty_pathtarget()
+      */
+     input_target = palloc0(sizeof(PathTarget));
+     flattenable_cols = NIL;
+
+     i = 0;
+     foreach(lc, final_target->exprs)
+     {
+         Expr       *expr = (Expr *) lfirst(lc);
+
+         if (postpone_col[i])
+         {
+             flattenable_cols = lappend(flattenable_cols, expr);
+         }
+         else
+         {
+             add_column_to_pathtarget(input_target, expr,
+                                      final_target->sortgrouprefs[i]);
+         }
+         i++;
+     }
+
+     /*
+      * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
+      * add them to the result tlist if not already present.  (Some might be
+      * there already because they're used directly as window/group clauses.)
+      *
+      * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
+      * Aggrefs are placed in the Agg node's tlist and not left to be computed
+      * at higher levels.
+      *
+      * XXX need to extend pull_var_clause to handle windowfuncs specially.
+      */
+     flattenable_vars = pull_var_clause((Node *) flattenable_cols,
+                                        PVC_INCLUDE_AGGREGATES,
+                                        PVC_INCLUDE_PLACEHOLDERS);
+     foreach(lc, flattenable_vars)
+     {
+         Expr       *expr = (Expr *) lfirst(lc);
+
+         if (!list_member(input_target->exprs, expr))
+             add_column_to_pathtarget(input_target, expr, 0);
+     }
+
+     /* clean up cruft */
+     list_free(flattenable_vars);
+     list_free(flattenable_cols);
+
+     /* XXX this represents even more redundant cost calculation ... */
+     return set_pathtarget_cost_width(root, input_target);
+ }
+
+ /*
   * get_cheapest_fractional_path
   *      Find the cheapest path for retrieving a specified fraction of all
   *      the tuples expected to be returned by the given relation.

Re: Optimizer questions

От
konstantin knizhnik
Дата:
On Mar 10, 2016, at 1:56 AM, Tom Lane wrote:

> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
>> I think that the best approach is to generate two different paths:
>> original one, when projection is always done before sort and another one
>> with postponed projection of non-trivial columns. Then we compare costs
>> of two paths and choose the best one.
>> Unfortunately, I do not understand now how to implement it with existed
>> grouping_planner.
>> Do you think that it is possible?
>
> After fooling with this awhile, I don't think it's actually necessary
> to do that.  See attached proof-of-concept patch.

O, you did all my work:)

But right now the rule for cost estimation makes it not possible to apply this optimization for simple expressions like
this:

postgres=# create table a (b integer);
CREATE TABLE
postgres=# insert into a values (generate_series(1, 10));
INSERT 0 10
postgres=# select b+b from a order by b limit 1;
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl
NOTICE:  int4pl?column?
----------       2
(1 row)
postgres=# create or replace function twice(x integer) returns integer as $$ begin raise notice 'exec function' ;
returnx + x ; end $$ language plpgsql; 
CREATE FUNCTION
postgres=# select twice(b) from a order by b limit 1;
NOTICE:  exec function
NOTICE:  int4pltwice
-------    2
(1 row)


I wonder is there any advantages of earlier evaluation of such simple expressions if them are not needed for sort?
It seems to be only true for narrowing functions, like md5... But I think that it is quite exotic case, isn't it?
May be it is reasonable to be more optimistic in estimation cost of postponed expression evaluation?


Also I do not completely understand your concern about windows functions.
Is there any example of query with windows or aggregate functions when this optimization (postponing expression
evaluation)can be applied? 
It will be also interesting to me to know if there are some other cases (except SORT+LIMIT) when delaying projection
leedsto more efficient plan. 



>
> Although this patch gets through our regression tests, that's only because
> it's conservative about deciding to postpone evaluation; if it tried to
> postpone evaluation in a query with window functions, it would fail
> miserably, because pull_var_clause doesn't know about window functions.
> I think that that's a clear oversight and we should extend it to offer
> the same sorts of behaviors as it does for Aggrefs.  But that would be
> slightly invasive, there being a dozen or so callers; so I didn't bother
> to do it yet pending comments on this patch.
>
> I think it's probably also broken for SRFs in the tlist; we need to work
> out what semantics we want for those.  If we postpone any SRF to after
> the Sort, we can no longer assume that a query LIMIT enables use of
> bounded sort (because the SRF might repeatedly return zero rows).
> I don't have a huge problem with that, but I think now would be a good
> time to nail down some semantics.
>
> Some other work that would be useful would be to refactor so that the
> cost_qual_eval operations aren't so redundant.  But that's just a small
> time savings not a question of functionality.
>
> And we'd have to document that this changes the behavior for volatile
> functions.  For the better, I think: this will mean that you get
> consistent results no matter whether the query is implemented by
> indexscan or seqscan-and-sort, which has never been true before.
> But it is a change.
>
> Do people approve of this sort of change in general, or this patch
> approach in particular?  Want to bikeshed the specific
> when-to-postpone-eval policies implemented here?  Other comments?
>
>             regards, tom lane
>
> diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
> index 8937e71..b15fca1 100644
> *** a/src/backend/optimizer/plan/planner.c
> --- b/src/backend/optimizer/plan/planner.c
> *************** static RelOptInfo *create_distinct_paths
> *** 126,131 ****
> --- 126,132 ----
>                        RelOptInfo *input_rel);
>  static RelOptInfo *create_ordered_paths(PlannerInfo *root,
>                       RelOptInfo *input_rel,
> +                      PathTarget *target,
>                       double limit_tuples);
>  static PathTarget *make_group_input_target(PlannerInfo *root, List *tlist);
>  static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
> *************** static PathTarget *make_window_input_tar
> *** 134,139 ****
> --- 135,142 ----
>                           List *tlist, List *activeWindows);
>  static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
>                           List *tlist);
> + static PathTarget *make_sort_input_target(PlannerInfo *root,
> +                        PathTarget *final_target);
>
>
>  /*****************************************************************************
> *************** grouping_planner(PlannerInfo *root, bool
> *** 1378,1383 ****
> --- 1381,1387 ----
>      int64        offset_est = 0;
>      int64        count_est = 0;
>      double        limit_tuples = -1.0;
> +     PathTarget *final_target;
>      RelOptInfo *current_rel;
>      RelOptInfo *final_rel;
>      ListCell   *lc;
> *************** grouping_planner(PlannerInfo *root, bool
> *** 1437,1442 ****
> --- 1441,1449 ----
>          /* Save aside the final decorated tlist */
>          root->processed_tlist = tlist;
>
> +         /* Also extract the PathTarget form of the setop result tlist */
> +         final_target = current_rel->cheapest_total_path->pathtarget;
> +
>          /*
>           * Can't handle FOR [KEY] UPDATE/SHARE here (parser should have
>           * checked already, but let's make sure).
> *************** grouping_planner(PlannerInfo *root, bool
> *** 1461,1467 ****
>      else
>      {
>          /* No set operations, do regular planning */
> !         PathTarget *final_target;
>          PathTarget *grouping_target;
>          PathTarget *scanjoin_target;
>          bool        have_grouping;
> --- 1468,1474 ----
>      else
>      {
>          /* No set operations, do regular planning */
> !         PathTarget *sort_input_target;
>          PathTarget *grouping_target;
>          PathTarget *scanjoin_target;
>          bool        have_grouping;
> *************** grouping_planner(PlannerInfo *root, bool
> *** 1657,1678 ****
>          final_target = create_pathtarget(root, tlist);
>
>          /*
>           * If we have window functions to deal with, the output from any
>           * grouping step needs to be what the window functions want;
> !          * otherwise, it should just be final_target.
>           */
>          if (activeWindows)
>              grouping_target = make_window_input_target(root,
>                                                         tlist,
>                                                         activeWindows);
>          else
> !             grouping_target = final_target;
>
>          /*
>           * If we have grouping or aggregation to do, the topmost scan/join
> !          * plan node must emit what the grouping step wants; otherwise, if
> !          * there's window functions, it must emit what the window functions
> !          * want; otherwise, it should emit final_target.
>           */
>          have_grouping = (parse->groupClause || parse->groupingSets ||
>                           parse->hasAggs || root->hasHavingQual);
> --- 1664,1694 ----
>          final_target = create_pathtarget(root, tlist);
>
>          /*
> +          * If ORDER BY was given, consider whether we should use a post-sort
> +          * projection, and compute the adjusted target for preceding steps if
> +          * so.
> +          */
> +         if (parse->sortClause)
> +             sort_input_target = make_sort_input_target(root, final_target);
> +         else
> +             sort_input_target = final_target;
> +
> +         /*
>           * If we have window functions to deal with, the output from any
>           * grouping step needs to be what the window functions want;
> !          * otherwise, it should be sort_input_target.
>           */
>          if (activeWindows)
>              grouping_target = make_window_input_target(root,
>                                                         tlist,
>                                                         activeWindows);
>          else
> !             grouping_target = sort_input_target;
>
>          /*
>           * If we have grouping or aggregation to do, the topmost scan/join
> !          * plan node must emit what the grouping step wants; otherwise, it
> !          * should emit grouping_target.
>           */
>          have_grouping = (parse->groupClause || parse->groupingSets ||
>                           parse->hasAggs || root->hasHavingQual);
> *************** grouping_planner(PlannerInfo *root, bool
> *** 1733,1739 ****
>              current_rel = create_window_paths(root,
>                                                current_rel,
>                                                grouping_target,
> !                                               final_target,
>                                                tlist,
>                                                wflists,
>                                                activeWindows);
> --- 1749,1755 ----
>              current_rel = create_window_paths(root,
>                                                current_rel,
>                                                grouping_target,
> !                                               sort_input_target,
>                                                tlist,
>                                                wflists,
>                                                activeWindows);
> *************** grouping_planner(PlannerInfo *root, bool
> *** 1793,1798 ****
> --- 1809,1815 ----
>      {
>          current_rel = create_ordered_paths(root,
>                                             current_rel,
> +                                            final_target,
>                                             limit_tuples);
>      }
>
> *************** create_distinct_paths(PlannerInfo *root,
> *** 3704,3715 ****
> --- 3721,3734 ----
>   * cheapest-total existing path.
>   *
>   * input_rel: contains the source-data Paths
> +  * target: the output tlist the result Paths must emit
>   * limit_tuples: estimated bound on the number of output tuples,
>   *        or -1 if no LIMIT or couldn't estimate
>   */
>  static RelOptInfo *
>  create_ordered_paths(PlannerInfo *root,
>                       RelOptInfo *input_rel,
> +                      PathTarget *target,
>                       double limit_tuples)
>  {
>      Path       *cheapest_input_path = input_rel->cheapest_total_path;
> *************** create_ordered_paths(PlannerInfo *root,
> *** 3737,3742 ****
> --- 3756,3767 ----
>                                                   root->sort_pathkeys,
>                                                   limit_tuples);
>              }
> +
> +             /* Add projection step if needed */
> +             if (path->pathtarget != target)
> +                 path = apply_projection_to_path(root, ordered_rel,
> +                                                 path, target);
> +
>              add_path(ordered_rel, path);
>          }
>      }
> *************** make_pathkeys_for_window(PlannerInfo *ro
> *** 4140,4145 ****
> --- 4165,4357 ----
>  }
>
>  /*
> +  * make_sort_input_target
> +  *      Generate appropriate PathTarget for initial input to Sort step.
> +  *
> +  * If the query has ORDER BY, this function chooses the target list to be
> +  * computed by the node just below the Sort (and Distinct, if any) steps.
> +  * This might or might not be identical to the query's final output tlist.
> +  *
> +  * The main argument for keeping the sort-input tlist the same as the final
> +  * is that we avoid a separate projection node (which will be needed if
> +  * they're different, because Sort can't project).  However, there are also
> +  * advantages to postponing tlist evaluation till after the Sort: it ensures
> +  * a consistent order of evaluation for any volatile functions in the tlist,
> +  * and if there's also a LIMIT, we can stop the query without ever computing
> +  * tlist functions for later rows, which is beneficial for both volatile and
> +  * expensive functions.
> +  *
> +  * Our current policy is to postpone volatile expressions till after the sort
> +  * unconditionally (assuming that that's possible, ie they are in plain tlist
> +  * columns and not ORDER/GROUP BY/DISTINCT columns).  Expensive expressions
> +  * are postponed if there is a LIMIT, or if root->tuple_fraction shows that
> +  * partial evaluation of the query is possible (if neither is true, we expect
> +  * to have to evaluate the expressions for every row anyway), or if there are
> +  * any volatile expressions (since once we've put in a projection at all,
> +  * it won't cost any more to postpone more stuff).
> +  *
> +  * Another issue that could potentially be considered here is that
> +  * evaluating tlist expressions could result in data that's either wider
> +  * or narrower than the input Vars, thus changing the volume of data that
> +  * has to go through the Sort.  However, we usually have only a very bad
> +  * idea of the output width of any expression more complex than a Var,
> +  * so for now it seems too risky to try to optimize on that basis.
> +  *
> +  * Note that if we do produce a modified sort-input target, and then the
> +  * query ends up not using an explicit Sort, no particular harm is done:
> +  * we'll initially use the modified target for the preceding path nodes,
> +  * but then change them to the final target with apply_projection_to_path.
> +  * Moreover, in such a case the guarantees about evaluation order of
> +  * volatile functions still hold, since the rows are sorted already.
> +  *
> +  * This function has some things in common with make_group_input_target and
> +  * make_window_input_target, though the detailed rules for what to do are
> +  * different.  We never flatten/postpone any grouping or ordering columns;
> +  * those are needed before the sort.  If we do flatten a particular
> +  * expression, we leave Aggref and WindowFunc nodes alone, since those were
> +  * computed earlier.
> +  *
> +  * 'final_target' is the query's final target list (in PathTarget form)
> +  *
> +  * The result is the PathTarget to be computed by the plan node immediately
> +  * below the Sort step (and the Distinct step, if any).  This will be
> +  * exactly final_target if we decide a projection step wouldn't be helpful.
> +  */
> + static PathTarget *
> + make_sort_input_target(PlannerInfo *root,
> +                        PathTarget *final_target)
> + {
> +     Query       *parse = root->parse;
> +     PathTarget *input_target;
> +     int            ncols;
> +     bool       *postpone_col;
> +     bool        have_volatile;
> +     bool        have_expensive;
> +     List       *flattenable_cols;
> +     List       *flattenable_vars;
> +     int            i;
> +     ListCell   *lc;
> +
> +     /* Shouldn't get here unless query has ORDER BY */
> +     Assert(parse->sortClause);
> +
> +     /* Inspect tlist and collect per-column information */
> +     ncols = list_length(final_target->exprs);
> +     postpone_col = (bool *) palloc0(ncols * sizeof(bool));
> +     have_volatile = have_expensive = false;
> +
> +     i = 0;
> +     foreach(lc, final_target->exprs)
> +     {
> +         Expr       *expr = (Expr *) lfirst(lc);
> +
> +         /*
> +          * If the column has a sortgroupref, assume it has to be evaluated
> +          * before sorting.  Generally such columns would be ORDER BY, GROUP
> +          * BY, etc targets.  One exception is columns that were removed from
> +          * GROUP BY by remove_useless_groupby_columns() ... but those would
> +          * only be Vars anyway.
> +          */
> +         if (final_target->sortgrouprefs[i] == 0)
> +         {
> +             /*
> +              * If it's volatile, that's an unconditional reason to postpone.
> +              */
> +             if (contain_volatile_functions((Node *) expr))
> +             {
> +                 postpone_col[i] = true;
> +                 have_volatile = true;
> +             }
> +             else
> +             {
> +                 /*
> +                  * Else check the cost.  XXX it's annoying to have to do this
> +                  * when set_pathtarget_cost_width() just did it.  Refactor to
> +                  * allow sharing the work?
> +                  */
> +                 QualCost    cost;
> +
> +                 cost_qual_eval_node(&cost, (Node *) expr, root);
> +
> +                 /*
> +                  * We arbitrarily define "expensive" as "more than 10X
> +                  * cpu_operator_cost".  Note this will take in any PL function
> +                  * with default cost.
> +                  */
> +                 if (cost.per_tuple >= 10 * cpu_operator_cost)
> +                 {
> +                     postpone_col[i] = true;
> +                     have_expensive = true;
> +                 }
> +             }
> +         }
> +         i++;
> +     }
> +
> +     /*
> +      * If we don't need a post-sort projection, just return final_target.
> +      */
> +     if (!(have_volatile ||
> +           (have_expensive &&
> +            (parse->limitCount || root->tuple_fraction > 0))))
> +         return final_target;
> +
> +     /*
> +      * Otherwise, construct the sort-input target, taking all non-postponable
> +      * columns and then adding Vars, PlaceHolderVars, Aggrefs, and WindowFuncs
> +      * found in the postponable ones.  XXX define a create_empty_pathtarget()
> +      */
> +     input_target = palloc0(sizeof(PathTarget));
> +     flattenable_cols = NIL;
> +
> +     i = 0;
> +     foreach(lc, final_target->exprs)
> +     {
> +         Expr       *expr = (Expr *) lfirst(lc);
> +
> +         if (postpone_col[i])
> +         {
> +             flattenable_cols = lappend(flattenable_cols, expr);
> +         }
> +         else
> +         {
> +             add_column_to_pathtarget(input_target, expr,
> +                                      final_target->sortgrouprefs[i]);
> +         }
> +         i++;
> +     }
> +
> +     /*
> +      * Pull out all the Vars and Aggrefs mentioned in flattenable columns, and
> +      * add them to the result tlist if not already present.  (Some might be
> +      * there already because they're used directly as window/group clauses.)
> +      *
> +      * Note: it's essential to use PVC_INCLUDE_AGGREGATES here, so that the
> +      * Aggrefs are placed in the Agg node's tlist and not left to be computed
> +      * at higher levels.
> +      *
> +      * XXX need to extend pull_var_clause to handle windowfuncs specially.
> +      */
> +     flattenable_vars = pull_var_clause((Node *) flattenable_cols,
> +                                        PVC_INCLUDE_AGGREGATES,
> +                                        PVC_INCLUDE_PLACEHOLDERS);
> +     foreach(lc, flattenable_vars)
> +     {
> +         Expr       *expr = (Expr *) lfirst(lc);
> +
> +         if (!list_member(input_target->exprs, expr))
> +             add_column_to_pathtarget(input_target, expr, 0);
> +     }
> +
> +     /* clean up cruft */
> +     list_free(flattenable_vars);
> +     list_free(flattenable_cols);
> +
> +     /* XXX this represents even more redundant cost calculation ... */
> +     return set_pathtarget_cost_width(root, input_target);
> + }
> +
> + /*
>   * get_cheapest_fractional_path
>   *      Find the cheapest path for retrieving a specified fraction of all
>   *      the tuples expected to be returned by the given relation.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers




Re: Optimizer questions

От
Tom Lane
Дата:
konstantin knizhnik <k.knizhnik@postgrespro.ru> writes:
> But right now the rule for cost estimation makes it not possible to apply this optimization for simple expressions
likethis: ...
 
> I wonder is there any advantages of earlier evaluation of such simple expressions if them are not needed for sort?

Well, as I said, my patch was intentionally written to be conservative
about when to change the semantics from what it's been for the last
twenty years.  I think there's a good argument for changing it for
volatile functions, but I'm less convinced that we should whack around
what happens in cases where there is not a clear benefit.  It's fairly
likely that there are users out there who have (perhaps without even
knowing it) optimized their queries to work well with eval-before-sort
behavior, perhaps by applying data-narrowing functions.  They won't
thank us for changing the behavior "just because".

If we had a more reliable idea of whether we are widening or narrowing
the sort data by postponing eval, I'd be willing to be more aggressive
about doing it.  But without that, I think it's best to be conservative
and only change when there's a pretty clear potential benefit.

> Also I do not completely understand your concern about windows functions.

The point is just that we have to not disassemble WindowFunc nodes when
building the sort-input tlist, in the same way that we don't disassemble
Aggref nodes.  If we are sorting the output of a WindowAgg, the WindowFunc
expressions have to appear as expressions in the WindowAgg's output tlist.

>> I think it's probably also broken for SRFs in the tlist; we need to work
>> out what semantics we want for those.  If we postpone any SRF to after
>> the Sort, we can no longer assume that a query LIMIT enables use of
>> bounded sort (because the SRF might repeatedly return zero rows).
>> I don't have a huge problem with that, but I think now would be a good
>> time to nail down some semantics.

As far as that goes, it seems to me after thinking about it that
non-sort-column tlist items containing SRFs should always be postponed,
too.  Performing a SRF before sorting bloats the sort data vertically,
rather than horizontally, but it's still bloat.  (Although as against
that, when you have ORDER BY + LIMIT, postponing SRFs loses the ability
to use a bounded sort.)  The killer point though is that unless the sort
is stable, it might cause surprising changes in the order of the SRF
output values.  Our sorts aren't stable; here's an example in HEAD:

# select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;q1  | generate_series 
-----+-----------------123 |               2123 |               3123 |               4123 |               5123 |
      6123 |               7123 |               1
 
(7 rows)

I think that's pretty surprising, and if we have an opportunity to
provide more intuitive semantics here, we should do it.
        regards, tom lane



Re: Optimizer questions

От
Tom Lane
Дата:
I wrote:
> As far as that goes, it seems to me after thinking about it that
> non-sort-column tlist items containing SRFs should always be postponed,
> too.  Performing a SRF before sorting bloats the sort data vertically,
> rather than horizontally, but it's still bloat.  (Although as against
> that, when you have ORDER BY + LIMIT, postponing SRFs loses the ability
> to use a bounded sort.)  The killer point though is that unless the sort
> is stable, it might cause surprising changes in the order of the SRF
> output values.  Our sorts aren't stable; here's an example in HEAD:

> # select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;
>  q1  | generate_series
> -----+-----------------
>  123 |               2
>  123 |               3
>  123 |               4
>  123 |               5
>  123 |               6
>  123 |               7
>  123 |               1
> (7 rows)

> I think that's pretty surprising, and if we have an opportunity to
> provide more intuitive semantics here, we should do it.

Here's an updated patch (requires current HEAD) that takes care of
window functions correctly and also does something reasonable with
set-returning functions:

# explain verbose select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Limit  (cost=1.11..1.14 rows=7 width=12)
   Output: q1, (generate_series(1, 9))
   ->  Result  (cost=1.11..26.16 rows=5000 width=12)
         Output: q1, generate_series(1, 9)
         ->  Sort  (cost=1.11..1.12 rows=5 width=8)
               Output: q1
               Sort Key: int8_tbl.q1
               ->  Seq Scan on public.int8_tbl  (cost=0.00..1.05 rows=5 width=8)
                     Output: q1
(9 rows)

# select q1, generate_series(1,9) from int8_tbl order by q1 limit 7;
 q1  | generate_series
-----+-----------------
 123 |               1
 123 |               2
 123 |               3
 123 |               4
 123 |               5
 123 |               6
 123 |               7
(7 rows)

I added some user-facing documentation too.  I think this is committable,
though maybe we should add a regression test case or two.

            regards, tom lane

diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 44810f4..0520f2c 100644
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
*************** UNBOUNDED FOLLOWING
*** 993,998 ****
--- 993,1028 ----
      cases it is not possible to specify new names with <literal>AS</>;
      the output column names will be the same as the table columns' names.
     </para>
+
+    <para>
+     According to the SQL standard, the expressions in the output list should
+     be computed before applying <literal>DISTINCT</literal>, <literal>ORDER
+     BY</literal>, or <literal>LIMIT</literal>.  This is obviously necessary
+     when using <literal>DISTINCT</literal>, since otherwise it's not clear
+     what values are being made distinct.  However, in many cases it is
+     convenient if output expressions are computed after <literal>ORDER
+     BY</literal> and <literal>LIMIT</literal>; particularly if the output list
+     contains any volatile or expensive functions.  With that behavior, the
+     order of function evaluations is more intuitive and there will not be
+     evaluations corresponding to rows that never appear in the output.
+     <productname>PostgreSQL</> will effectively evaluate output expressions
+     after sorting and limiting, so long as those expressions are not
+     referenced in <literal>DISTINCT</literal>, <literal>ORDER BY</literal>
+     or <literal>GROUP BY</literal>.  (As a counterexample, <literal>SELECT
+     f(x) FROM tab ORDER BY 1</> clearly must evaluate <function>f(x)</>
+     before sorting.)  Output expressions that contain set-returning functions
+     are effectively evaluated after sorting and before limiting, so
+     that <literal>LIMIT</literal> will act to cut off the output from a
+     set-returning function.
+    </para>
+
+    <note>
+     <para>
+      <productname>PostgreSQL</> versions before 9.6 did not provide any
+      guarantees about the timing of evaluation of output expressions versus
+      sorting and limiting; it depended on the form of the chosen query plan.
+     </para>
+    </note>
    </refsect2>

    <refsect2 id="sql-distinct">
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a2cd6de..51f4ee4 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** static RelOptInfo *create_distinct_paths
*** 127,132 ****
--- 127,133 ----
                        RelOptInfo *input_rel);
  static RelOptInfo *create_ordered_paths(PlannerInfo *root,
                       RelOptInfo *input_rel,
+                      PathTarget *target,
                       double limit_tuples);
  static PathTarget *make_group_input_target(PlannerInfo *root, List *tlist);
  static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
*************** static PathTarget *make_window_input_tar
*** 135,140 ****
--- 136,144 ----
                           List *tlist, List *activeWindows);
  static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
                           List *tlist);
+ static PathTarget *make_sort_input_target(PlannerInfo *root,
+                        PathTarget *final_target,
+                        bool *have_postponed_srfs);


  /*****************************************************************************
*************** grouping_planner(PlannerInfo *root, bool
*** 1379,1384 ****
--- 1383,1391 ----
      int64        offset_est = 0;
      int64        count_est = 0;
      double        limit_tuples = -1.0;
+     bool        have_postponed_srfs = false;
+     double        tlist_rows;
+     PathTarget *final_target;
      RelOptInfo *current_rel;
      RelOptInfo *final_rel;
      ListCell   *lc;
*************** grouping_planner(PlannerInfo *root, bool
*** 1438,1443 ****
--- 1445,1453 ----
          /* Save aside the final decorated tlist */
          root->processed_tlist = tlist;

+         /* Also extract the PathTarget form of the setop result tlist */
+         final_target = current_rel->cheapest_total_path->pathtarget;
+
          /*
           * Can't handle FOR [KEY] UPDATE/SHARE here (parser should have
           * checked already, but let's make sure).
*************** grouping_planner(PlannerInfo *root, bool
*** 1462,1472 ****
      else
      {
          /* No set operations, do regular planning */
!         PathTarget *final_target;
          PathTarget *grouping_target;
          PathTarget *scanjoin_target;
          bool        have_grouping;
-         double        tlist_rows;
          WindowFuncLists *wflists = NULL;
          List       *activeWindows = NIL;
          List       *rollup_lists = NIL;
--- 1472,1481 ----
      else
      {
          /* No set operations, do regular planning */
!         PathTarget *sort_input_target;
          PathTarget *grouping_target;
          PathTarget *scanjoin_target;
          bool        have_grouping;
          WindowFuncLists *wflists = NULL;
          List       *activeWindows = NIL;
          List       *rollup_lists = NIL;
*************** grouping_planner(PlannerInfo *root, bool
*** 1620,1626 ****
           * Figure out whether there's a hard limit on the number of rows that
           * query_planner's result subplan needs to return.  Even if we know a
           * hard limit overall, it doesn't apply if the query has any
!          * grouping/aggregation operations.
           */
          if (parse->groupClause ||
              parse->groupingSets ||
--- 1629,1638 ----
           * Figure out whether there's a hard limit on the number of rows that
           * query_planner's result subplan needs to return.  Even if we know a
           * hard limit overall, it doesn't apply if the query has any
!          * grouping/aggregation operations.  (XXX it also doesn't apply if the
!          * tlist contains any SRFs; but checking for that here seems more
!          * costly than it's worth, since root->limit_tuples is only used for
!          * cost estimates, and only in a small number of cases.)
           */
          if (parse->groupClause ||
              parse->groupingSets ||
*************** grouping_planner(PlannerInfo *root, bool
*** 1658,1679 ****
          final_target = create_pathtarget(root, tlist);

          /*
           * If we have window functions to deal with, the output from any
           * grouping step needs to be what the window functions want;
!          * otherwise, it should just be final_target.
           */
          if (activeWindows)
              grouping_target = make_window_input_target(root,
                                                         tlist,
                                                         activeWindows);
          else
!             grouping_target = final_target;

          /*
           * If we have grouping or aggregation to do, the topmost scan/join
!          * plan node must emit what the grouping step wants; otherwise, if
!          * there's window functions, it must emit what the window functions
!          * want; otherwise, it should emit final_target.
           */
          have_grouping = (parse->groupClause || parse->groupingSets ||
                           parse->hasAggs || root->hasHavingQual);
--- 1670,1701 ----
          final_target = create_pathtarget(root, tlist);

          /*
+          * If ORDER BY was given, consider whether we should use a post-sort
+          * projection, and compute the adjusted target for preceding steps if
+          * so.
+          */
+         if (parse->sortClause)
+             sort_input_target = make_sort_input_target(root, final_target,
+                                                        &have_postponed_srfs);
+         else
+             sort_input_target = final_target;
+
+         /*
           * If we have window functions to deal with, the output from any
           * grouping step needs to be what the window functions want;
!          * otherwise, it should be sort_input_target.
           */
          if (activeWindows)
              grouping_target = make_window_input_target(root,
                                                         tlist,
                                                         activeWindows);
          else
!             grouping_target = sort_input_target;

          /*
           * If we have grouping or aggregation to do, the topmost scan/join
!          * plan node must emit what the grouping step wants; otherwise, it
!          * should emit grouping_target.
           */
          have_grouping = (parse->groupClause || parse->groupingSets ||
                           parse->hasAggs || root->hasHavingQual);
*************** grouping_planner(PlannerInfo *root, bool
*** 1734,1779 ****
              current_rel = create_window_paths(root,
                                                current_rel,
                                                grouping_target,
!                                               final_target,
                                                tlist,
                                                wflists,
                                                activeWindows);
          }

          /*
-          * If there are set-returning functions in the tlist, scale up the
-          * assumed output rowcounts of all surviving Paths to account for
-          * that.  This is a bit of a kluge, but it's not clear how to account
-          * for it in a more principled way.  We definitely don't want to apply
-          * the multiplier more than once, which would happen if we tried to
-          * fold it into PathTarget accounting.  And the expansion does happen
-          * before any explicit DISTINCT or ORDER BY processing is done.
-          */
-         tlist_rows = tlist_returns_set_rows(tlist);
-         if (tlist_rows > 1)
-         {
-             foreach(lc, current_rel->pathlist)
-             {
-                 Path       *path = (Path *) lfirst(lc);
-
-                 /*
-                  * We assume that execution costs of the tlist as such were
-                  * already accounted for.  However, it still seems appropriate
-                  * to charge something more for the executor's general costs
-                  * of processing the added tuples.  The cost is probably less
-                  * than cpu_tuple_cost, though, so we arbitrarily use half of
-                  * that.
-                  */
-                 path->total_cost += path->rows * (tlist_rows - 1) *
-                     cpu_tuple_cost / 2;
-
-                 path->rows *= tlist_rows;
-             }
-
-             /* There seems no need for a fresh set_cheapest comparison. */
-         }
-
-         /*
           * If there is a DISTINCT clause, consider ways to implement that. We
           * build a new upperrel representing the output of this phase.
           */
--- 1756,1768 ----
              current_rel = create_window_paths(root,
                                                current_rel,
                                                grouping_target,
!                                               sort_input_target,
                                                tlist,
                                                wflists,
                                                activeWindows);
          }

          /*
           * If there is a DISTINCT clause, consider ways to implement that. We
           * build a new upperrel representing the output of this phase.
           */
*************** grouping_planner(PlannerInfo *root, bool
*** 1787,1803 ****

      /*
       * If ORDER BY was given, consider ways to implement that, and generate a
!      * new upperrel containing only paths that emit the correct ordering.  We
!      * can apply the original limit_tuples limit in sorting now.
       */
      if (parse->sortClause)
      {
          current_rel = create_ordered_paths(root,
                                             current_rel,
                                             limit_tuples);
      }

      /*
       * Now we are prepared to build the final-output upperrel.  Insert all
       * surviving paths, with LockRows, Limit, and/or ModifyTable steps added
       * if needed.
--- 1776,1826 ----

      /*
       * If ORDER BY was given, consider ways to implement that, and generate a
!      * new upperrel containing only paths that emit the correct ordering and
!      * project the correct final_target.  We can apply the original
!      * limit_tuples limit in sort costing here, but only if there are no
!      * postponed SRFs.
       */
      if (parse->sortClause)
      {
          current_rel = create_ordered_paths(root,
                                             current_rel,
+                                            final_target,
+                                            have_postponed_srfs ? -1.0 :
                                             limit_tuples);
      }

      /*
+      * If there are set-returning functions in the tlist, scale up the output
+      * rowcounts of all surviving Paths to account for that.  Note that if any
+      * SRFs appear in sorting or grouping columns, we'll have underestimated
+      * the numbers of rows passing through earlier steps; but that's such a
+      * weird usage that it doesn't seem worth greatly complicating matters to
+      * account for it.
+      */
+     tlist_rows = tlist_returns_set_rows(tlist);
+     if (tlist_rows > 1)
+     {
+         foreach(lc, current_rel->pathlist)
+         {
+             Path       *path = (Path *) lfirst(lc);
+
+             /*
+              * We assume that execution costs of the tlist as such were
+              * already accounted for.  However, it still seems appropriate to
+              * charge something more for the executor's general costs of
+              * processing the added tuples.  The cost is probably less than
+              * cpu_tuple_cost, though, so we arbitrarily use half of that.
+              */
+             path->total_cost += path->rows * (tlist_rows - 1) *
+                 cpu_tuple_cost / 2;
+
+             path->rows *= tlist_rows;
+         }
+         /* No need to run set_cheapest; we're keeping all paths anyway. */
+     }
+
+     /*
       * Now we are prepared to build the final-output upperrel.  Insert all
       * surviving paths, with LockRows, Limit, and/or ModifyTable steps added
       * if needed.
*************** create_distinct_paths(PlannerInfo *root,
*** 3705,3716 ****
--- 3728,3741 ----
   * cheapest-total existing path.
   *
   * input_rel: contains the source-data Paths
+  * target: the output tlist the result Paths must emit
   * limit_tuples: estimated bound on the number of output tuples,
   *        or -1 if no LIMIT or couldn't estimate
   */
  static RelOptInfo *
  create_ordered_paths(PlannerInfo *root,
                       RelOptInfo *input_rel,
+                      PathTarget *target,
                       double limit_tuples)
  {
      Path       *cheapest_input_path = input_rel->cheapest_total_path;
*************** create_ordered_paths(PlannerInfo *root,
*** 3738,3743 ****
--- 3763,3774 ----
                                                   root->sort_pathkeys,
                                                   limit_tuples);
              }
+
+             /* Add projection step if needed */
+             if (path->pathtarget != target)
+                 path = apply_projection_to_path(root, ordered_rel,
+                                                 path, target);
+
              add_path(ordered_rel, path);
          }
      }
*************** make_pathkeys_for_window(PlannerInfo *ro
*** 4144,4149 ****
--- 4175,4383 ----
  }

  /*
+  * make_sort_input_target
+  *      Generate appropriate PathTarget for initial input to Sort step.
+  *
+  * If the query has ORDER BY, this function chooses the target to be computed
+  * by the node just below the Sort (and DISTINCT, if any, since Unique can't
+  * project) steps.  This might or might not be identical to the query's final
+  * output target.
+  *
+  * The main argument for keeping the sort-input tlist the same as the final
+  * is that we avoid a separate projection node (which will be needed if
+  * they're different, because Sort can't project).  However, there are also
+  * advantages to postponing tlist evaluation till after the Sort: it ensures
+  * a consistent order of evaluation for any volatile functions in the tlist,
+  * and if there's also a LIMIT, we can stop the query without ever computing
+  * tlist functions for later rows, which is beneficial for both volatile and
+  * expensive functions.
+  *
+  * Our current policy is to postpone volatile expressions till after the sort
+  * unconditionally (assuming that that's possible, ie they are in plain tlist
+  * columns and not ORDER BY/GROUP BY/DISTINCT columns).  We also postpone
+  * set-returning expressions unconditionally (if possible), because running
+  * them beforehand would bloat the sort dataset, and because it might cause
+  * unexpected output order if the sort isn't stable.  Expensive expressions
+  * are postponed if there is a LIMIT, or if root->tuple_fraction shows that
+  * partial evaluation of the query is possible (if neither is true, we expect
+  * to have to evaluate the expressions for every row anyway), or if there are
+  * any volatile or set-returning expressions (since once we've put in a
+  * projection at all, it won't cost any more to postpone more stuff).
+  *
+  * Another issue that could potentially be considered here is that
+  * evaluating tlist expressions could result in data that's either wider
+  * or narrower than the input Vars, thus changing the volume of data that
+  * has to go through the Sort.  However, we usually have only a very bad
+  * idea of the output width of any expression more complex than a Var,
+  * so for now it seems too risky to try to optimize on that basis.
+  *
+  * Note that if we do produce a modified sort-input target, and then the
+  * query ends up not using an explicit Sort, no particular harm is done:
+  * we'll initially use the modified target for the preceding path nodes,
+  * but then change them to the final target with apply_projection_to_path.
+  * Moreover, in such a case the guarantees about evaluation order of
+  * volatile functions still hold, since the rows are sorted already.
+  *
+  * This function has some things in common with make_group_input_target and
+  * make_window_input_target, though the detailed rules for what to do are
+  * different.  We never flatten/postpone any grouping or ordering columns;
+  * those are needed before the sort.  If we do flatten a particular
+  * expression, we leave Aggref and WindowFunc nodes alone, since those were
+  * computed earlier.
+  *
+  * 'final_target' is the query's final target list (in PathTarget form)
+  * 'have_postponed_srfs' is an output argument, see below
+  *
+  * The result is the PathTarget to be computed by the plan node immediately
+  * below the Sort step (and the Distinct step, if any).  This will be
+  * exactly final_target if we decide a projection step wouldn't be helpful.
+  *
+  * In addition, *have_postponed_srfs is set to TRUE if we choose to postpone
+  * any set-returning functions to after the Sort.
+  */
+ static PathTarget *
+ make_sort_input_target(PlannerInfo *root,
+                        PathTarget *final_target,
+                        bool *have_postponed_srfs)
+ {
+     Query       *parse = root->parse;
+     PathTarget *input_target;
+     int            ncols;
+     bool       *postpone_col;
+     bool        have_srf;
+     bool        have_volatile;
+     bool        have_expensive;
+     List       *postponable_cols;
+     List       *postponable_vars;
+     int            i;
+     ListCell   *lc;
+
+     /* Shouldn't get here unless query has ORDER BY */
+     Assert(parse->sortClause);
+
+     *have_postponed_srfs = false;        /* default result */
+
+     /* Inspect tlist and collect per-column information */
+     ncols = list_length(final_target->exprs);
+     postpone_col = (bool *) palloc0(ncols * sizeof(bool));
+     have_srf = have_volatile = have_expensive = false;
+
+     i = 0;
+     foreach(lc, final_target->exprs)
+     {
+         Expr       *expr = (Expr *) lfirst(lc);
+
+         /*
+          * If the column has a sortgroupref, assume it has to be evaluated
+          * before sorting.  Generally such columns would be ORDER BY, GROUP
+          * BY, etc targets.  One exception is columns that were removed from
+          * GROUP BY by remove_useless_groupby_columns() ... but those would
+          * only be Vars anyway.  There don't seem to be any cases where it
+          * would be worth the trouble to double-check.
+          */
+         if (final_target->sortgrouprefs[i] == 0)
+         {
+             /*
+              * If it returns a set or is volatile, that's an unconditional
+              * reason to postpone.  Check the SRF case first because we must
+              * know whether we have any postponed SRFs.
+              */
+             if (expression_returns_set((Node *) expr))
+             {
+                 postpone_col[i] = true;
+                 have_srf = true;
+             }
+             else if (contain_volatile_functions((Node *) expr))
+             {
+                 postpone_col[i] = true;
+                 have_volatile = true;
+             }
+             else
+             {
+                 /*
+                  * Else check the cost.  XXX it's annoying to have to do this
+                  * when set_pathtarget_cost_width() just did it.  Refactor to
+                  * allow sharing the work?
+                  */
+                 QualCost    cost;
+
+                 cost_qual_eval_node(&cost, (Node *) expr, root);
+
+                 /*
+                  * We arbitrarily define "expensive" as "more than 10X
+                  * cpu_operator_cost".  Note this will take in any PL function
+                  * with default cost.
+                  */
+                 if (cost.per_tuple > 10 * cpu_operator_cost)
+                 {
+                     postpone_col[i] = true;
+                     have_expensive = true;
+                 }
+             }
+         }
+         i++;
+     }
+
+     /*
+      * If we don't need a post-sort projection, just return final_target.
+      */
+     if (!(have_srf || have_volatile ||
+           (have_expensive &&
+            (parse->limitCount || root->tuple_fraction > 0))))
+         return final_target;
+
+     /*
+      * Report whether the post-sort projection will contain set-returning
+      * functions.  This is important because it affects whether the Sort can
+      * rely on the query's LIMIT (if any) to bound the number of rows it needs
+      * to return.
+      */
+     *have_postponed_srfs = have_srf;
+
+     /*
+      * Construct the sort-input target, taking all non-postponable columns and
+      * then adding Vars, PlaceHolderVars, Aggrefs, and WindowFuncs found in
+      * the postponable ones.
+      */
+     input_target = create_empty_pathtarget();
+     postponable_cols = NIL;
+
+     i = 0;
+     foreach(lc, final_target->exprs)
+     {
+         Expr       *expr = (Expr *) lfirst(lc);
+
+         if (postpone_col[i])
+             postponable_cols = lappend(postponable_cols, expr);
+         else
+             add_column_to_pathtarget(input_target, expr,
+                                      final_target->sortgrouprefs[i]);
+
+         i++;
+     }
+
+     /*
+      * Pull out all the Vars, Aggrefs, and WindowFuncs mentioned in
+      * postponable columns, and add them to the sort-input target if not
+      * already present.  (Some might be there already.)  We mustn't
+      * deconstruct Aggrefs or WindowFuncs here, since the projection node
+      * would be unable to recompute them.
+      */
+     postponable_vars = pull_var_clause((Node *) postponable_cols,
+                                        PVC_INCLUDE_AGGREGATES |
+                                        PVC_INCLUDE_WINDOWFUNCS |
+                                        PVC_INCLUDE_PLACEHOLDERS);
+     add_new_columns_to_pathtarget(input_target, postponable_vars);
+
+     /* clean up cruft */
+     list_free(postponable_vars);
+     list_free(postponable_cols);
+
+     /* XXX this represents even more redundant cost calculation ... */
+     return set_pathtarget_cost_width(root, input_target);
+ }
+
+ /*
   * get_cheapest_fractional_path
   *      Find the cheapest path for retrieving a specified fraction of all
   *      the tuples expected to be returned by the given relation.
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index cbc8c2b..9f85dee 100644
*** a/src/backend/optimizer/util/tlist.c
--- b/src/backend/optimizer/util/tlist.c
*************** copy_pathtarget(PathTarget *src)
*** 624,629 ****
--- 624,640 ----
  }

  /*
+  * create_empty_pathtarget
+  *      Create an empty (zero columns, zero cost) PathTarget.
+  */
+ PathTarget *
+ create_empty_pathtarget(void)
+ {
+     /* This is easy, but we don't want callers to hard-wire this ... */
+     return (PathTarget *) palloc0(sizeof(PathTarget));
+ }
+
+ /*
   * add_column_to_pathtarget
   *        Append a target column to the PathTarget.
   *
*************** add_column_to_pathtarget(PathTarget *tar
*** 656,661 ****
--- 667,707 ----
  }

  /*
+  * add_new_column_to_pathtarget
+  *        Append a target column to the PathTarget, but only if it's not
+  *        equal() to any pre-existing target expression.
+  *
+  * The caller cannot specify a sortgroupref, since it would be unclear how
+  * to merge that with a pre-existing column.
+  *
+  * As with make_pathtarget_from_tlist, we leave it to the caller to update
+  * the cost and width fields.
+  */
+ void
+ add_new_column_to_pathtarget(PathTarget *target, Expr *expr)
+ {
+     if (!list_member(target->exprs, expr))
+         add_column_to_pathtarget(target, expr, 0);
+ }
+
+ /*
+  * add_new_columns_to_pathtarget
+  *        Apply add_new_column_to_pathtarget() for each element of the list.
+  */
+ void
+ add_new_columns_to_pathtarget(PathTarget *target, List *exprs)
+ {
+     ListCell   *lc;
+
+     foreach(lc, exprs)
+     {
+         Expr       *expr = (Expr *) lfirst(lc);
+
+         add_new_column_to_pathtarget(target, expr);
+     }
+ }
+
+ /*
   * apply_pathtarget_labeling_to_tlist
   *        Apply any sortgrouprefs in the PathTarget to matching tlist entries
   *
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index 2c79a80..0d745a0 100644
*** a/src/include/optimizer/tlist.h
--- b/src/include/optimizer/tlist.h
*************** extern bool grouping_is_hashable(List *g
*** 55,62 ****
--- 55,65 ----
  extern PathTarget *make_pathtarget_from_tlist(List *tlist);
  extern List *make_tlist_from_pathtarget(PathTarget *target);
  extern PathTarget *copy_pathtarget(PathTarget *src);
+ extern PathTarget *create_empty_pathtarget(void);
  extern void add_column_to_pathtarget(PathTarget *target,
                           Expr *expr, Index sortgroupref);
+ extern void add_new_column_to_pathtarget(PathTarget *target, Expr *expr);
+ extern void add_new_columns_to_pathtarget(PathTarget *target, List *exprs);
  extern void apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target);

  /* Convenience macro to get a PathTarget with valid cost/width fields */