Обсуждение: Planner makes sub-optimal execution plan

Поиск
Список
Период
Сортировка

Planner makes sub-optimal execution plan

От
Алексей Борщёв
Дата:
Hi, all!

The execution plan was broken during upgrade PG13 to PG14
We've managed to emulate table and reproduce on PG17:

# cat /etc/*release
PRETTY_NAME="Debian GNU/Linux trixie/sid"

select version();
PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

-- DROP TABLE IF EXISTS docum CASCADE;
CREATE TABLE docum (
    id BIGSERIAL PRIMARY KEY,
    dt TIMESTAMP,
    dt_real TIMESTAMP NOT NULL
);

-- Fill table with 12M data rows (approx. 30 sec):
DO $$
DECLARE
    batch_size INT := 50000;
    total_rows INT := 12000000;  -- 12 M records
    current_row INT := 0;
    base_date TIMESTAMP := '2024-01-01 00:00:00'::timestamp;
    cutoff_date TIMESTAMP := '2025-08-14 09:44:09.033592'::timestamp;
    time_increment INTERVAL;
BEGIN
    RAISE NOTICE 'Start. Total rows %', total_rows;
    ALTER TABLE docum SET (autovacuum_enabled = false);
    -- Calculate time increment for even distribution
    -- Data for ~590 days before cutoff_date
    time_increment := (cutoff_date - base_date) / (total_rows * 0.995);
    -- Insert sorted data for high correlation
    WHILE current_row < total_rows LOOP
        INSERT INTO docum (dt, dt_real)
        SELECT
            -- dt: almost in sync with dt_real bu with some noise and 5% NULL
            CASE
                WHEN random() < 0.05 THEN NULL
                ELSE base_date + (time_increment * (current_row + i)) +
                     (random() * INTERVAL '1 hour' - INTERVAL '30
minutes')  -- noise
            END,
            -- dt_real: monotiniacly increasing, correlation 0.999
            CASE
                -- 99.5% records before cutoff_date
                WHEN current_row < total_rows * 0.995 THEN
                    base_date + (time_increment * (current_row + i))
                -- 0.5% records after cutoff_date (~60K records)
                ELSE
                    cutoff_date + ((current_row + i - total_rows *
0.995) * INTERVAL '10 seconds')
            END
        FROM generate_series(0, LEAST(batch_size - 1, total_rows -
current_row - 1)) AS i;
        current_row := current_row + batch_size;
        -- Show progress:
        IF current_row % 500000 = 0 THEN
            RAISE NOTICE 'Processed: % rows (%.1f%%)', current_row,
(current_row::float / total_rows * 100);
        END IF;
    END LOOP;
    RAISE NOTICE 'Completed. Total rows inserted: %', current_row;
END $$;

-- Indexes:
CREATE INDEX docum_dt      ON docum(dt);
CREATE INDEX docum_dt_real ON docum(dt_real);

ALTER TABLE docum SET (autovacuum_enabled = true);

-- Gather statistics:
ALTER TABLE docum ALTER COLUMN dt SET STATISTICS 400;
ALTER TABLE docum ALTER COLUMN dt_real SET STATISTICS 400;
-- default_statistics_target = 400 will also work.

VACUUM ANALYZE docum;

-- The problematic query:
EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING)
SELECT
    MIN(docum.dt) AS "dt__min",
    MAX(docum.dt_real) AS "dt_real__max"
FROM docum
WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp;
-- The plan I've got is:
Result  (cost=8.38..8.39 rows=1 width=16) (actual
time=2660.034..2660.036 rows=1 loops=1)
  Output: (InitPlan 1).col1, (InitPlan 2).col1
  Buffers: shared hit=9358751 read=30994 written=1
  InitPlan 1
    ->  Limit  (cost=0.43..7.91 rows=1 width=8) (actual
time=2660.006..2660.007 rows=1 loops=1)
          Output: docum.dt
          Buffers: shared hit=9358747 read=30994 written=1
          ->  Index Scan using docum_dt_7ee1d676 on public.docum
(cost=0.43..420487.43 rows=56222 width=8) (actual
time=2660.004..2660.005 rows=1 loops=1)
                Output: docum.dt
                Index Cond: (docum.dt IS NOT NULL)
                Filter: (docum.dt_real >= '2025-08-14
09:44:09.033592'::timestamp without time zone)
                Rows Removed by Filter: 11342966
                Buffers: shared hit=9358747 read=30994 written=1
  InitPlan 2
    ->  Limit  (cost=0.43..0.46 rows=1 width=8) (actual
time=0.022..0.022 rows=1 loops=1)
          Output: docum_1.dt_real
          Buffers: shared hit=4
          ->  Index Only Scan Backward using docum_dt_real_2b81c58c on
public.docum docum_1  (cost=0.43..1689.22 rows=59245 width=8) (actual
time=0.021..0.021 rows=1 loops=1)
                Output: docum_1.dt_real
                Index Cond: (docum_1.dt_real >= '2025-08-14
09:44:09.033592'::timestamp without time zone)
                Heap Fetches: 0
                Buffers: shared hit=4
Settings: work_mem = '16MB', search_path = 'public, public, "$user"'
Planning:
  Buffers: shared hit=12
Planning Time: 0.148 ms
Execution Time: 2660.056 ms

This plan is not good:
Execution Time: 2660.056 ms, But cost=8.38..8.39 only - looks too small!
On this step:
 ->  Index Scan using docum_dt_7ee1d676 on public.docum
(cost=0.43..420487.43 ...
Higher cost = 420487 looks adequate, but it was lost somehow and did
not count into total query plan cost.

-- There is a possible fix for query: just add count(*) column to SELECT:
EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING)
SELECT count(*) as cnt,
    MIN(docum.dt) AS "dt__min",
    MAX(docum.dt_real) AS "dt_real__max"
FROM docum
WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp;

Aggregate  (cost=2512.56..2512.57 rows=1 width=24) (actual
time=10.179..10.180 rows=1 loops=1)
  Output: count(*), min(dt), max(dt_real)
  Buffers: shared hit=383 read=165
  ->  Index Scan using docum_dt_real_2b81c58c on public.docum
(cost=0.43..2068.22 rows=59245 width=16) (actual time=0.013..6.430
rows=59999 loops=1)
        Output: id, dt, dt_real
        Index Cond: (docum.dt_real >= '2025-08-14
09:44:09.033592'::timestamp without time zone)
        Buffers: shared hit=383 read=165
Settings: work_mem = '16MB', search_path = 'public, public, "$user"'
Planning:
  Buffers: shared hit=9 read=1
Planning Time: 0.927 ms
Execution Time: 10.199 ms

-- This plan cost = 2512 is much higher, but execution time is way better!

Did You manage to reproduce this plan?
Can I help You with more details?



Re: Planner makes sub-optimal execution plan

От
Alena Rybakina
Дата:
Hi! Thank you for sharing this interesting case!

On 01.09.2025 12:07, Алексей Борщёв wrote:
> EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING)
> SELECT
>      MIN(docum.dt) AS "dt__min",
>      MAX(docum.dt_real) AS "dt_real__max"
> FROM docum
> WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp;
> -- The plan I've got is:
> Result  (cost=8.38..8.39 rows=1 width=16) (actual
> time=2660.034..2660.036 rows=1 loops=1)
>    Output: (InitPlan 1).col1, (InitPlan 2).col1
>    Buffers: shared hit=9358751 read=30994 written=1
>    InitPlan 1
>      ->  Limit  (cost=0.43..7.91 rows=1 width=8) (actual
> time=2660.006..2660.007 rows=1 loops=1)
>            Output: docum.dt
>            Buffers: shared hit=9358747 read=30994 written=1
>            ->  Index Scan using docum_dt_7ee1d676 on public.docum
> (cost=0.43..420487.43 rows=56222 width=8) (actual
> time=2660.004..2660.005 rows=1 loops=1)
>                  Output: docum.dt
>                  Index Cond: (docum.dt IS NOT NULL)
>                  Filter: (docum.dt_real >= '2025-08-14
> 09:44:09.033592'::timestamp without time zone)
>                  Rows Removed by Filter: 11342966
>                  Buffers: shared hit=9358747 read=30994 written=1
>    InitPlan 2
>      ->  Limit  (cost=0.43..0.46 rows=1 width=8) (actual
> time=0.022..0.022 rows=1 loops=1)
>            Output: docum_1.dt_real
>            Buffers: shared hit=4
>            ->  Index Only Scan Backward using docum_dt_real_2b81c58c on
> public.docum docum_1  (cost=0.43..1689.22 rows=59245 width=8) (actual
> time=0.021..0.021 rows=1 loops=1)
>                  Output: docum_1.dt_real
>                  Index Cond: (docum_1.dt_real >= '2025-08-14
> 09:44:09.033592'::timestamp without time zone)
>                  Heap Fetches: 0
>                  Buffers: shared hit=4
> Settings: work_mem = '16MB', search_path = 'public, public, "$user"'
> Planning:
>    Buffers: shared hit=12
> Planning Time: 0.148 ms
> Execution Time: 2660.056 ms


After disabling MIN/MAX optimization in the grouping_planner function:
/*
          * Preprocess MIN/MAX aggregates, if any.  Note: be careful about
          * adding logic between here and the query_planner() call.  
Anything
          * that is needed in MIN/MAX-optimizable cases will have to be
          * duplicated in planagg.c.
          */
         //if (parse->hasAggs)
         //    preprocess_minmax_aggregates(root);

I got a better query plan, but I’m still investigating what went wrong.


Also, creating a partial index helped because of the use of indexonlyscan:

CREATE INDEX CONCURRENTLY docum_dt_recent
   ON docum (dt)
   WHERE dt_real >= timestamp '2025-08-01 00:00:00';

postgres=# EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING)
SELECT
     MIN(docum.dt) AS "dt__min",
     MAX(docum.dt_real) AS "dt_real__max"
FROM docum
WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Result  (cost=1.11..1.12 rows=1 width=16) (actual 
time=206.206..206.207 rows=1.00 loops=1)
    Output: (InitPlan 1).col1, (InitPlan 2).col1
    Buffers: shared hit=212507
    InitPlan 1
      ->  Limit  (cost=0.42..0.64 rows=1 width=8) (actual 
time=206.154..206.155 rows=1.00 loops=1)
            Output: docum.dt
            Buffers: shared hit=212503
            ->  Index Scan using docum_dt_recent on public.docum  
(cost=0.42..12391.88 rows=55710 width=8) (actual time=206.150..206.150 
rows=1.00 loops=1)
                  Output: docum.dt
                  Index Cond: (docum.dt IS NOT NULL)
                  Filter: (docum.dt_real >= '2025-08-14 
09:44:09.033592'::timestamp without time zone)
                  Rows Removed by Filter: 256799
                  Index Searches: 1
                  Buffers: shared hit=212503
    InitPlan 2
      ->  Limit  (cost=0.43..0.46 rows=1 width=8) (actual 
time=0.042..0.042 rows=1.00 loops=1)
            Output: docum_1.dt_real
            Buffers: shared hit=4
            ->  Index Only Scan Backward using docum_dt_real on 
public.docum docum_1  (cost=0.43..1671.62 rows=58696 width=8) (actual 
time=0.041..0.041 rows=1.00 loops=1)
                  Output: docum_1.dt_real
                  Index Cond: (docum_1.dt_real >= '2025-08-14 
09:44:09.033592'::timestamp without time zone)
                  Heap Fetches: 0
                  Index Searches: 1
                  Buffers: shared hit=4
  Planning:
    Buffers: shared hit=30 read=10
  Planning Time: 2.512 ms
  Execution Time: 206.290 ms
(28 rows)

--

Regards,
Alena Rybakina
Postgres Professional




Re: Planner makes sub-optimal execution plan

От
David Rowley
Дата:
On Tue, 2 Sept 2025 at 00:41, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> After disabling MIN/MAX optimization in the grouping_planner function:
> /*
>           * Preprocess MIN/MAX aggregates, if any.  Note: be careful about
>           * adding logic between here and the query_planner() call.
> Anything
>           * that is needed in MIN/MAX-optimizable cases will have to be
>           * duplicated in planagg.c.
>           */
>          //if (parse->hasAggs)
>          //    preprocess_minmax_aggregates(root);
>
> I got a better query plan, but I’m still investigating what went wrong.

This is basically just the standard issue people have with how we cost LIMIT.

If you look at this part:

->  Limit  (cost=0.43..7.91 rows=1 width=8)
          Output: docum.dt
          ->  Index Scan using docum_dt_7ee1d676 on public.docum
(cost=0.43..420487.43 rows=56222 width=8)

you can see that the Limit total cost is startup (0.43) plus 420487.43
/ 56222 * 1.

The problem is that the planner assumes there's no correlation between
dt and dt_real. It thinks because a decent number of rows have dt_real
>= '2025-08-14 09:44:09.033592' that it'll find the LIMIT 1 row fairly
quickly by scanning the docum_dt_7ee1d676 index. Unfortunately, it's
not quick because the rows matching  dt_real >= '2025-08-14
09:44:09.033592' are nearly at the end of the index.

In v16+, using something like MIN(docum.dt ORDER BY dt) AS "dt__min"
should force the planner into giving a better plan. ORDER BY /
DISTINCT aggregates are a bit less efficient in versions earlier than
that, so doing that would require a bit of additional work if using
v14. It might not be too bad, though.

David