Обсуждение: Can JoinFilter condition be pushed down into IndexScan?

Поиск
Список
Период
Сортировка

Can JoinFilter condition be pushed down into IndexScan?

От
Bəxtiyar Neyman
Дата:
I define a table user_ranks as such:

CREATE TABLE user_ranks (
  id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  rank INTEGER NOT NULL,
  CONSTRAINT "by (rank, id)" UNIQUE (rank, id)
);

INSERT INTO user_ranks (user_id, rank) SELECT generate_series(1, 10000), generate_series(1, 10000);

Here's a query I'd like to optimize:

explain (analyze,verbose)
SELECT
  t3_0."id" AS "id",
  t3_0."rank" AS "rank"
FROM
  LATERAL (
    SELECT
      t4_0."rank" AS "rank"
    FROM
      user_ranks AS t4_0
    WHERE
      (t4_0."id" = 4732455)
  ) AS t3_1
  INNER JOIN user_ranks AS t3_0 ON true
WHERE
  (
    ((t3_0."rank", t3_0."id") <= (t3_1."rank", 4732455))
    AND true
  )
ORDER BY
  t3_0."rank" DESC,
  t3_0."id" DESC
LIMIT
  10


It compiles to the following plan:

 Limit  (cost=0.56..250.94 rows=10 width=12) (actual time=8.078..8.078 rows=1 loops=1)
   Output: t3_0.id, t3_0.rank
   ->  Nested Loop  (cost=0.56..41763.27 rows=1668 width=12) (actual time=8.075..8.076 rows=1 loops=1)
         Output: t3_0.id, t3_0.rank
         Inner Unique: true
         Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, 4732455))
         Rows Removed by Join Filter: 5002
         ->  Index Only Scan Backward using "by (rank,id)" on public.user_ranks t3_0  (cost=0.28..163.33 rows=5003 width=12) (actual time=0.023..0.638 rows=5003 loops=1)
               Output: t3_0.rank, t3_0.id
               Heap Fetches: 0
         ->  Index Scan using "by id" on public.user_ranks t4_0  (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=5003)
               Output: t4_0.id, t4_0.rating, t4_0.rank
               Index Cond: (t4_0.id = 4732455)

As you can see, there are a lot of rows returned by t3_0, which are then filtered by Join Filter. But it would have been better if instead of the filter, the  t3_0 table would have an Index Cond. Similar to how it happens when a correlated subquery is used (or a CTE)

explain (analyze,verbose)
SELECT
  t3_0."id" AS "id",
  t3_0."rank" AS "rank"
FROM
  user_ranks AS t3_0
WHERE
  (
    ((t3_0."rank", t3_0."id") <= (
    SELECT
      t4_0."rank" AS "rank",
      t4_0."id" AS "id"
    FROM
      user_ranks AS t4_0
    WHERE
      (t4_0."id" = 4732455)
    ))
    AND true
  )
ORDER BY
  t3_0."rank" DESC,
  t3_0."id" DESC
LIMIT
  10

 Limit  (cost=8.58..8.95 rows=10 width=12) (actual time=0.062..0.064 rows=1 loops=1)
   Output: t3_0.id, t3_0.rank
   InitPlan 1 (returns $0,$1)
     ->  Index Scan using "by id" on public.user_ranks t4_0  (cost=0.28..8.30 rows=1 width=12) (actual time=0.024..0.025 rows=1 loops=1)
           Output: t4_0.rank, t4_0.id
           Index Cond: (t4_0.id = 4732455)
   ->  Index Only Scan Backward using "by (rank,id)" on public.user_ranks t3_0  (cost=0.28..61.47 rows=1668 width=12) (actual time=0.061..0.062 rows=1 loops=1)
         Output: t3_0.id, t3_0.rank
         Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1))
         Heap Fetches: 0


I'm an opposite of a PostgreSQL expert, but it was surprising to me to see that a correlated subquery behaves better than a join. Is this normal? Is it something worth fixing/easy to fix?

Sincerely,
Bakhtiyar

Re: Can JoinFilter condition be pushed down into IndexScan?

От
Tomas Vondra
Дата:
On 6/21/23 05:37, Bəxtiyar Neyman wrote:
> I define a table user_ranks as such:
> 
> CREATE TABLE user_ranks (
>   id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
>   rank INTEGER NOT NULL,
>   CONSTRAINT "by (rank, id)" UNIQUE (rank, id)
> );
> 
> INSERT INTO user_ranks (user_id, rank) SELECT generate_series(1, 10000),
> generate_series(1, 10000);
> 

This doesn't work, the INSERT needs to only insert into (rank).

> Here's a query I'd like to optimize:
> 
> explain (analyze,verbose)
> SELECT
>   t3_0."id" AS "id",
>   t3_0."rank" AS "rank"
> FROM
>   LATERAL (
>     SELECT
>       t4_0."rank" AS "rank"
>     FROM
>       user_ranks AS t4_0
>     WHERE
>       (t4_0."id" = 4732455)
>   ) AS t3_1
>   INNER JOIN user_ranks AS t3_0 ON true
> WHERE
>   (
>     ((t3_0."rank", t3_0."id") <= (t3_1."rank", 4732455))
>     AND true
>   )
> ORDER BY
>   t3_0."rank" DESC,
>   t3_0."id" DESC
> LIMIT
>   10
> 

Not sure why you make the query unnecessarily complicated - the LATERAL
is pointless I believe, the "AND true" just make it harder to read.
Let's rewrite it it like this to make discussion easier:

explain (analyze,verbose)
SELECT
  t3_0."id" AS "id",
  t3_0."rank" AS "rank"
FROM
  user_ranks AS t3_1
  INNER JOIN user_ranks AS t3_0
          ON ((t3_0."rank", t3_0."id") <= (t3_1."rank", t3_1."id"))
WHERE
  t3_1."id" = 4732455
ORDER BY
  t3_0."rank" DESC,
  t3_0."id" DESC
LIMIT
  10

Same query, but perhaps easier to read.

> It compiles to the following plan:
> 
>  Limit  (cost=0.56..250.94 rows=10 width=12) (actual time=8.078..8.078
> rows=1 loops=1)
>    Output: t3_0.id <http://t3_0.id>, t3_0.rank
>    ->  Nested Loop  (cost=0.56..41763.27 rows=1668 width=12) (actual
> time=8.075..8.076 rows=1 loops=1)
>          Output: t3_0.id <http://t3_0.id>, t3_0.rank
>          Inner Unique: true
>          Join Filter: (ROW(t3_0.rank, t3_0.id <http://t3_0.id>) <=
> ROW(t4_0.rank, 4732455))
>          Rows Removed by Join Filter: 5002
>          ->  Index Only Scan Backward using "by (rank,id)" on
> public.user_ranks t3_0  (cost=0.28..163.33 rows=5003 width=12) (actual
> time=0.023..0.638 rows=5003 loops=1)
>                Output: t3_0.rank, t3_0.id <http://t3_0.id>
>                Heap Fetches: 0
>          ->  Index Scan using "by id" on public.user_ranks t4_0
>  (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1
> loops=5003)
>                Output: t4_0.id <http://t4_0.id>, t4_0.rating, t4_0.rank
>                Index Cond: (t4_0.id <http://t4_0.id> = 4732455)
> 
> As you can see, there are a lot of rows returned by t3_0, which are then
> filtered by Join Filter. But it would have been better if instead of the
> filter, the  t3_0 table would have an Index Cond. Similar to how it
> happens when a correlated subquery is used (or a CTE)
> 
> explain (analyze,verbose)
> SELECT
>   t3_0."id" AS "id",
>   t3_0."rank" AS "rank"
> FROM
>   user_ranks AS t3_0
> WHERE
>   (
>     ((t3_0."rank", t3_0."id") <= (
>     SELECT
>       t4_0."rank" AS "rank",
>       t4_0."id" AS "id"
>     FROM
>       user_ranks AS t4_0
>     WHERE
>       (t4_0."id" = 4732455)
>     ))
>     AND true
>   )
> ORDER BY
>   t3_0."rank" DESC,
>   t3_0."id" DESC
> LIMIT
>   10
> 
>  Limit  (cost=8.58..8.95 rows=10 width=12) (actual time=0.062..0.064
> rows=1 loops=1)
>    Output: t3_0.id <http://t3_0.id>, t3_0.rank
>    InitPlan 1 (returns $0,$1)
>      ->  Index Scan using "by id" on public.user_ranks t4_0
>  (cost=0.28..8.30 rows=1 width=12) (actual time=0.024..0.025 rows=1 loops=1)
>            Output: t4_0.rank, t4_0.id <http://t4_0.id>
>            Index Cond: (t4_0.id <http://t4_0.id> = 4732455)
>    ->  Index Only Scan Backward using "by (rank,id)" on
> public.user_ranks t3_0  (cost=0.28..61.47 rows=1668 width=12) (actual
> time=0.061..0.062 rows=1 loops=1)
>          Output: t3_0.id <http://t3_0.id>, t3_0.rank
>          Index Cond: (ROW(t3_0.rank, t3_0.id <http://t3_0.id>) <=
> ROW($0, $1))
>          Heap Fetches: 0
> 
> 
> I'm an opposite of a PostgreSQL expert, but it was surprising to me to
> see that a correlated subquery behaves better than a join. Is this
> normal? Is it something worth fixing/easy to fix?
> 

Because those queries are not doing the same thing. In the first query
you sort by t3_0 columns, while the "id = 4732455" condition is on the
other table. And so it can't use the index scan for sorting.

While in the second query it can do that, and it doesn't need to do the
explicit sort (which needs to fetch all the rows etc.). If you alter the
first query to do

  ORDER BY
    t3_1."rank" DESC,
    t3_1."id" DESC

it'll use the same plan as the second query. Well, not exactly the same,
but much closer to it.


Nevertheless, these example queries have other estimation issues, which
might result in poor plan choices. There's no row for (id = 4732455),
and the cross-table inequality estimate is just some default estimate
(33%). In reality, this produces no rows.

Secondly, for LIMIT, the cost is assumed to be "proportional" fraction
of the input costs. In other words, we expect the limit to terminate
after only seeing a fraction of rows - if we expect to see 10000 rows
and the query has LIMIT 10, we expect to only do 1/1000 of the work. But
if the subtree does not produce 10000 rows, that goes out of the window
and we may need to do much more work.

I'm not sure why the two queries actually use different plans even after
the ORDER BY change. I would have expected the second query (with
correlated subquery) to be transformed to a join, but perhaps that
transformation would be invalid, or maybe the planner does that based on
cost (and then it's not surprising due to the estimation issues).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Can JoinFilter condition be pushed down into IndexScan?

От
Bəxtiyar Neyman
Дата:
Thanks Tomas for the lengthy write-up!

Pardon the noise in the queries (LATERAL, AND true etc): they were autogenerated by the library we wrote.

> Because those queries are not doing the same thing. In the first query
> you sort by t3_0 columns, while the "id = 4732455" condition is on the
> other table. And so it can't use the index scan for sorting.
>
> While in the second query it can do that, and it doesn't need to do the
> explicit sort (which needs to fetch all the rows etc.).

Let me try to explain what both of my queries do:
1) Get the rank of the user using its id (id = 4732455 in this example, but it could have been one that exists, e.g. id = 500). This is LATERAL t3_1 in the first query and subquery in the WHERE clause of the second query.
2) Using that rank, get the next 10 users by rank. This is t3_0.

Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC, t3_1."id" DESC" as you suggest, because then the order of returned rows will not be guaranteed. In fact, such a clause will have no effect because there is going to be at most one row supplied by t3_1 anyway.

My question thus still stands. The planner knows that t3_1 has at most one row, and it knows that t3_0 can produce up to 5000 rows. Yet, it doesn't figure out that it could have lowered the Join Filter condition from the first plan as an Index Cond of the Index Scan of t3_1. Is there a fundamental reason for this, or is this something worth improving in the planner?

Sincerely,
Bakhtiyar

On Wed, Jun 21, 2023 at 6:28 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 6/21/23 05:37, Bəxtiyar Neyman wrote:
> I define a table user_ranks as such:
>
> CREATE TABLE user_ranks (
>   id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
>   rank INTEGER NOT NULL,
>   CONSTRAINT "by (rank, id)" UNIQUE (rank, id)
> );
>
> INSERT INTO user_ranks (user_id, rank) SELECT generate_series(1, 10000),
> generate_series(1, 10000);
>

This doesn't work, the INSERT needs to only insert into (rank).

> Here's a query I'd like to optimize:
>
> explain (analyze,verbose)
> SELECT
>   t3_0."id" AS "id",
>   t3_0."rank" AS "rank"
> FROM
>   LATERAL (
>     SELECT
>       t4_0."rank" AS "rank"
>     FROM
>       user_ranks AS t4_0
>     WHERE
>       (t4_0."id" = 4732455)
>   ) AS t3_1
>   INNER JOIN user_ranks AS t3_0 ON true
> WHERE
>   (
>     ((t3_0."rank", t3_0."id") <= (t3_1."rank", 4732455))
>     AND true
>   )
> ORDER BY
>   t3_0."rank" DESC,
>   t3_0."id" DESC
> LIMIT
>   10
>

Not sure why you make the query unnecessarily complicated - the LATERAL
is pointless I believe, the "AND true" just make it harder to read.
Let's rewrite it it like this to make discussion easier:

explain (analyze,verbose)
SELECT
  t3_0."id" AS "id",
  t3_0."rank" AS "rank"
FROM
  user_ranks AS t3_1
  INNER JOIN user_ranks AS t3_0
          ON ((t3_0."rank", t3_0."id") <= (t3_1."rank", t3_1."id"))
WHERE
  t3_1."id" = 4732455
ORDER BY
  t3_0."rank" DESC,
  t3_0."id" DESC
LIMIT
  10

Same query, but perhaps easier to read.

> It compiles to the following plan:
>
>  Limit  (cost=0.56..250.94 rows=10 width=12) (actual time=8.078..8.078
> rows=1 loops=1)
>    Output: t3_0.id <http://t3_0.id>, t3_0.rank
>    ->  Nested Loop  (cost=0.56..41763.27 rows=1668 width=12) (actual
> time=8.075..8.076 rows=1 loops=1)
>          Output: t3_0.id <http://t3_0.id>, t3_0.rank
>          Inner Unique: true
>          Join Filter: (ROW(t3_0.rank, t3_0.id <http://t3_0.id>) <=
> ROW(t4_0.rank, 4732455))
>          Rows Removed by Join Filter: 5002
>          ->  Index Only Scan Backward using "by (rank,id)" on
> public.user_ranks t3_0  (cost=0.28..163.33 rows=5003 width=12) (actual
> time=0.023..0.638 rows=5003 loops=1)
>                Output: t3_0.rank, t3_0.id <http://t3_0.id>
>                Heap Fetches: 0
>          ->  Index Scan using "by id" on public.user_ranks t4_0
>  (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1
> loops=5003)
>                Output: t4_0.id <http://t4_0.id>, t4_0.rating, t4_0.rank
>                Index Cond: (t4_0.id <http://t4_0.id> = 4732455)
>
> As you can see, there are a lot of rows returned by t3_0, which are then
> filtered by Join Filter. But it would have been better if instead of the
> filter, the  t3_0 table would have an Index Cond. Similar to how it
> happens when a correlated subquery is used (or a CTE)
>
> explain (analyze,verbose)
> SELECT
>   t3_0."id" AS "id",
>   t3_0."rank" AS "rank"
> FROM
>   user_ranks AS t3_0
> WHERE
>   (
>     ((t3_0."rank", t3_0."id") <= (
>     SELECT
>       t4_0."rank" AS "rank",
>       t4_0."id" AS "id"
>     FROM
>       user_ranks AS t4_0
>     WHERE
>       (t4_0."id" = 4732455)
>     ))
>     AND true
>   )
> ORDER BY
>   t3_0."rank" DESC,
>   t3_0."id" DESC
> LIMIT
>   10
>
>  Limit  (cost=8.58..8.95 rows=10 width=12) (actual time=0.062..0.064
> rows=1 loops=1)
>    Output: t3_0.id <http://t3_0.id>, t3_0.rank
>    InitPlan 1 (returns $0,$1)
>      ->  Index Scan using "by id" on public.user_ranks t4_0
>  (cost=0.28..8.30 rows=1 width=12) (actual time=0.024..0.025 rows=1 loops=1)
>            Output: t4_0.rank, t4_0.id <http://t4_0.id>
>            Index Cond: (t4_0.id <http://t4_0.id> = 4732455)
>    ->  Index Only Scan Backward using "by (rank,id)" on
> public.user_ranks t3_0  (cost=0.28..61.47 rows=1668 width=12) (actual
> time=0.061..0.062 rows=1 loops=1)
>          Output: t3_0.id <http://t3_0.id>, t3_0.rank
>          Index Cond: (ROW(t3_0.rank, t3_0.id <http://t3_0.id>) <=
> ROW($0, $1))
>          Heap Fetches: 0
>
>
> I'm an opposite of a PostgreSQL expert, but it was surprising to me to
> see that a correlated subquery behaves better than a join. Is this
> normal? Is it something worth fixing/easy to fix?
>

Because those queries are not doing the same thing. In the first query
you sort by t3_0 columns, while the "id = 4732455" condition is on the
other table. And so it can't use the index scan for sorting.

While in the second query it can do that, and it doesn't need to do the
explicit sort (which needs to fetch all the rows etc.). If you alter the
first query to do

  ORDER BY
    t3_1."rank" DESC,
    t3_1."id" DESC

it'll use the same plan as the second query. Well, not exactly the same,
but much closer to it.


Nevertheless, these example queries have other estimation issues, which
might result in poor plan choices. There's no row for (id = 4732455),
and the cross-table inequality estimate is just some default estimate
(33%). In reality, this produces no rows.

Secondly, for LIMIT, the cost is assumed to be "proportional" fraction
of the input costs. In other words, we expect the limit to terminate
after only seeing a fraction of rows - if we expect to see 10000 rows
and the query has LIMIT 10, we expect to only do 1/1000 of the work. But
if the subtree does not produce 10000 rows, that goes out of the window
and we may need to do much more work.

I'm not sure why the two queries actually use different plans even after
the ORDER BY change. I would have expected the second query (with
correlated subquery) to be transformed to a join, but perhaps that
transformation would be invalid, or maybe the planner does that based on
cost (and then it's not surprising due to the estimation issues).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Can JoinFilter condition be pushed down into IndexScan?

От
Tomas Vondra
Дата:
On 6/21/23 20:37, Bəxtiyar Neyman wrote:
> Thanks Tomas for the lengthy write-up!
> 
> Pardon the noise in the queries (LATERAL, AND true etc): they were
> autogenerated by the library we wrote.
> 

I know, but it makes them harder to read for people. If you want people
to respond it's generally a good idea to make it easy to understand the
question. Don't make them waste their time - they'll just skip the
message entirely.

>> Because those queries are not doing the same thing. In the first query
>> you sort by t3_0 columns, while the "id = 4732455" condition is on the
>> other table. And so it can't use the index scan for sorting.
>>
>> While in the second query it can do that, and it doesn't need to do the
>> explicit sort (which needs to fetch all the rows etc.).
> 
> Let me try to explain what both of my queries do:
> 1) Get the rank of the user using its id (id = 4732455 in this example,
> but it could have been one that exists, e.g. id = 500). This is LATERAL
> t3_1 in the first query and subquery in the WHERE clause of the second
> query.
> 2) Using that rank, get the next 10 users by rank. This is t3_0.
> 
> Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC,
> t3_1."id" DESC" as you suggest, because then the order of returned rows
> will not be guaranteed. In fact, such a clause will have no effect
> because there is going to be at most one row supplied by t3_1 anyway.
> 

Ah, OK. I got this wrong.

> My question thus still stands. The planner knows that t3_1 has at most
> one row, and it knows that t3_0 can produce up to 5000 rows. Yet, it
> doesn't figure out that it could have lowered the Join Filter condition
> from the first plan as an Index Cond of the Index Scan of t3_1. Is there
> a fundamental reason for this, or is this something worth improving in
> the planner?
> 

As I tried to explain before, I don't think the problem is in the
planner not being able to do this transformation, but more likely in not
being able to cost it correctly.

Consider this (with 1M rows in the user_ranks table):

1) subquery case
=================

     Limit  (cost=8.87..9.15 rows=10 width=8) (actual time=0.032..0.037
rows=10 loops=1)
   Output: t3_0.id, t3_0.rank
   InitPlan 1 (returns $0,$1)
     ->  Index Scan using user_ranks_pkey on public.user_ranks t4_0
(cost=0.42..8.44 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1)
           Output: t4_0.rank, t4_0.id
           Index Cond: (t4_0.id = 333333)
   ->  Index Only Scan Backward using "by (rank, id)" on
public.user_ranks t3_0  (cost=0.42..9493.75 rows=333333 width=8) (actual
time=0.031..0.033 rows=10 loops=1)
         Output: t3_0.id, t3_0.rank
         Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1))
         Heap Fetches: 0
 Planning Time: 0.072 ms
 Execution Time: 0.055 ms
(12 rows)


2) join
=======

 Limit  (cost=0.85..2.15 rows=10 width=8) (actual time=464.662..464.672
rows=10 loops=1)
   Output: t3_0.id, t3_0.rank
   ->  Nested Loop  (cost=0.85..43488.87 rows=333333 width=8) (actual
time=464.660..464.667 rows=10 loops=1)
         Output: t3_0.id, t3_0.rank
         Inner Unique: true
         Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, t4_0.id))
         Rows Removed by Join Filter: 666667
         ->  Index Only Scan Backward using "by (rank, id)" on
public.user_ranks t3_0  (cost=0.42..25980.42 rows=1000000 width=8)
(actual time=0.015..93.703 rows=666677 loops=1)
               Output: t3_0.rank, t3_0.id
               Heap Fetches: 0
         ->  Materialize  (cost=0.42..8.45 rows=1 width=8) (actual
time=0.000..0.000 rows=1 loops=666677)
               Output: t4_0.rank, t4_0.id
               ->  Index Scan using user_ranks_pkey on public.user_ranks
t4_0  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1
loops=1)
                     Output: t4_0.rank, t4_0.id
                     Index Cond: (t4_0.id = 333333)
 Planning Time: 0.092 ms
 Execution Time: 464.696 ms
(17 rows)


3) join (with LEFT JOIN)
========================

 Limit  (cost=20038.73..20038.76 rows=10 width=8) (actual
time=180.714..180.720 rows=10 loops=1)
   Output: t3_0.id, t3_0.rank
   ->  Sort  (cost=20038.73..20872.06 rows=333333 width=8) (actual
time=180.712..180.715 rows=10 loops=1)
         Output: t3_0.id, t3_0.rank
         Sort Key: t3_0.rank DESC, t3_0.id DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Nested Loop Left Join  (cost=0.85..12835.52 rows=333333
width=8) (actual time=0.033..122.000 rows=333333 loops=1)
               Output: t3_0.id, t3_0.rank
               ->  Index Scan using user_ranks_pkey on public.user_ranks
t4_0  (cost=0.42..8.44 rows=1 width=8) (actual time=0.018..0.020 rows=1
loops=1)
                     Output: t4_0.id, t4_0.rank
                     Index Cond: (t4_0.id = 333333)
               ->  Index Only Scan using "by (rank, id)" on
public.user_ranks t3_0  (cost=0.42..9493.75 rows=333333 width=8) (actual
time=0.013..49.759 rows=333333 loops=1)
                     Output: t3_0.rank, t3_0.id
                     Index Cond: (ROW(t3_0.rank, t3_0.id) <=
ROW(t4_0.rank, t4_0.id))
                     Heap Fetches: 0
 Planning Time: 0.087 ms
 Execution Time: 180.744 ms
(17 rows)


So, the optimizer clearly believes the subquery case has cost 9.15,
while the inner join case costs 2.15. So it believes the plan is
"cheaper" than the subquery. So even if it knew how to do the
transformation / build the other plan (which I'm not sure it can), it
probably wouldn't do it.

OTOH if you rewrite it to a left join, it costs 20038.76 - way more than
the inner join, but it's actually 2x faster.


AFAICS there's no chance to make this bit smarter until the estimates
get much better to reality.


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Can JoinFilter condition be pushed down into IndexScan?

От
Bəxtiyar Neyman
Дата:
Thanks, Tomas!

> I know, but it makes them harder to read for people. If you want people
> to respond it's generally a good idea to make it easy to understand the
> question. Don't make them waste their time - they'll just skip the
> message entirely.

Fair point.


> So, the optimizer clearly believes the subquery case has cost 9.15,
> while the inner join case costs 2.15. So it believes the plan is
> "cheaper" than the subquery. So even if it knew how to do the
> transformation / build the other plan (which I'm not sure it can), it
> probably wouldn't do it.

> AFAICS there's no chance to make this bit smarter until the estimates
> get much better to reality.

Got it. Thanks. I guess we'll have to emit correlated subqueries/CTEs.

Sincerely,
Bakhtiyar

On Wed, Jun 21, 2023 at 12:58 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 6/21/23 20:37, Bəxtiyar Neyman wrote:
> Thanks Tomas for the lengthy write-up!
>
> Pardon the noise in the queries (LATERAL, AND true etc): they were
> autogenerated by the library we wrote.
>

I know, but it makes them harder to read for people. If you want people
to respond it's generally a good idea to make it easy to understand the
question. Don't make them waste their time - they'll just skip the
message entirely.

>> Because those queries are not doing the same thing. In the first query
>> you sort by t3_0 columns, while the "id = 4732455" condition is on the
>> other table. And so it can't use the index scan for sorting.
>>
>> While in the second query it can do that, and it doesn't need to do the
>> explicit sort (which needs to fetch all the rows etc.).
>
> Let me try to explain what both of my queries do:
> 1) Get the rank of the user using its id (id = 4732455 in this example,
> but it could have been one that exists, e.g. id = 500). This is LATERAL
> t3_1 in the first query and subquery in the WHERE clause of the second
> query.
> 2) Using that rank, get the next 10 users by rank. This is t3_0.
>
> Thus I can't just change the first query to "ORDER BY t3_1."rank" DESC,
> t3_1."id" DESC" as you suggest, because then the order of returned rows
> will not be guaranteed. In fact, such a clause will have no effect
> because there is going to be at most one row supplied by t3_1 anyway.
>

Ah, OK. I got this wrong.

> My question thus still stands. The planner knows that t3_1 has at most
> one row, and it knows that t3_0 can produce up to 5000 rows. Yet, it
> doesn't figure out that it could have lowered the Join Filter condition
> from the first plan as an Index Cond of the Index Scan of t3_1. Is there
> a fundamental reason for this, or is this something worth improving in
> the planner?
>

As I tried to explain before, I don't think the problem is in the
planner not being able to do this transformation, but more likely in not
being able to cost it correctly.

Consider this (with 1M rows in the user_ranks table):

1) subquery case
=================

     Limit  (cost=8.87..9.15 rows=10 width=8) (actual time=0.032..0.037
rows=10 loops=1)
   Output: t3_0.id, t3_0.rank
   InitPlan 1 (returns $0,$1)
     ->  Index Scan using user_ranks_pkey on public.user_ranks t4_0
(cost=0.42..8.44 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1)
           Output: t4_0.rank, t4_0.id
           Index Cond: (t4_0.id = 333333)
   ->  Index Only Scan Backward using "by (rank, id)" on
public.user_ranks t3_0  (cost=0.42..9493.75 rows=333333 width=8) (actual
time=0.031..0.033 rows=10 loops=1)
         Output: t3_0.id, t3_0.rank
         Index Cond: (ROW(t3_0.rank, t3_0.id) <= ROW($0, $1))
         Heap Fetches: 0
 Planning Time: 0.072 ms
 Execution Time: 0.055 ms
(12 rows)


2) join
=======

 Limit  (cost=0.85..2.15 rows=10 width=8) (actual time=464.662..464.672
rows=10 loops=1)
   Output: t3_0.id, t3_0.rank
   ->  Nested Loop  (cost=0.85..43488.87 rows=333333 width=8) (actual
time=464.660..464.667 rows=10 loops=1)
         Output: t3_0.id, t3_0.rank
         Inner Unique: true
         Join Filter: (ROW(t3_0.rank, t3_0.id) <= ROW(t4_0.rank, t4_0.id))
         Rows Removed by Join Filter: 666667
         ->  Index Only Scan Backward using "by (rank, id)" on
public.user_ranks t3_0  (cost=0.42..25980.42 rows=1000000 width=8)
(actual time=0.015..93.703 rows=666677 loops=1)
               Output: t3_0.rank, t3_0.id
               Heap Fetches: 0
         ->  Materialize  (cost=0.42..8.45 rows=1 width=8) (actual
time=0.000..0.000 rows=1 loops=666677)
               Output: t4_0.rank, t4_0.id
               ->  Index Scan using user_ranks_pkey on public.user_ranks
t4_0  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1
loops=1)
                     Output: t4_0.rank, t4_0.id
                     Index Cond: (t4_0.id = 333333)
 Planning Time: 0.092 ms
 Execution Time: 464.696 ms
(17 rows)


3) join (with LEFT JOIN)
========================

 Limit  (cost=20038.73..20038.76 rows=10 width=8) (actual
time=180.714..180.720 rows=10 loops=1)
   Output: t3_0.id, t3_0.rank
   ->  Sort  (cost=20038.73..20872.06 rows=333333 width=8) (actual
time=180.712..180.715 rows=10 loops=1)
         Output: t3_0.id, t3_0.rank
         Sort Key: t3_0.rank DESC, t3_0.id DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Nested Loop Left Join  (cost=0.85..12835.52 rows=333333
width=8) (actual time=0.033..122.000 rows=333333 loops=1)
               Output: t3_0.id, t3_0.rank
               ->  Index Scan using user_ranks_pkey on public.user_ranks
t4_0  (cost=0.42..8.44 rows=1 width=8) (actual time=0.018..0.020 rows=1
loops=1)
                     Output: t4_0.id, t4_0.rank
                     Index Cond: (t4_0.id = 333333)
               ->  Index Only Scan using "by (rank, id)" on
public.user_ranks t3_0  (cost=0.42..9493.75 rows=333333 width=8) (actual
time=0.013..49.759 rows=333333 loops=1)
                     Output: t3_0.rank, t3_0.id
                     Index Cond: (ROW(t3_0.rank, t3_0.id) <=
ROW(t4_0.rank, t4_0.id))
                     Heap Fetches: 0
 Planning Time: 0.087 ms
 Execution Time: 180.744 ms
(17 rows)


So, the optimizer clearly believes the subquery case has cost 9.15,
while the inner join case costs 2.15. So it believes the plan is
"cheaper" than the subquery. So even if it knew how to do the
transformation / build the other plan (which I'm not sure it can), it
probably wouldn't do it.

OTOH if you rewrite it to a left join, it costs 20038.76 - way more than
the inner join, but it's actually 2x faster.


AFAICS there's no chance to make this bit smarter until the estimates
get much better to reality.


--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company