Обсуждение: Planner makes sub-optimal execution plan
Hi, all! The execution plan was broken during upgrade PG13 to PG14 We've managed to emulate table and reproduce on PG17: # cat /etc/*release PRETTY_NAME="Debian GNU/Linux trixie/sid" select version(); PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit -- DROP TABLE IF EXISTS docum CASCADE; CREATE TABLE docum ( id BIGSERIAL PRIMARY KEY, dt TIMESTAMP, dt_real TIMESTAMP NOT NULL ); -- Fill table with 12M data rows (approx. 30 sec): DO $$ DECLARE batch_size INT := 50000; total_rows INT := 12000000; -- 12 M records current_row INT := 0; base_date TIMESTAMP := '2024-01-01 00:00:00'::timestamp; cutoff_date TIMESTAMP := '2025-08-14 09:44:09.033592'::timestamp; time_increment INTERVAL; BEGIN RAISE NOTICE 'Start. Total rows %', total_rows; ALTER TABLE docum SET (autovacuum_enabled = false); -- Calculate time increment for even distribution -- Data for ~590 days before cutoff_date time_increment := (cutoff_date - base_date) / (total_rows * 0.995); -- Insert sorted data for high correlation WHILE current_row < total_rows LOOP INSERT INTO docum (dt, dt_real) SELECT -- dt: almost in sync with dt_real bu with some noise and 5% NULL CASE WHEN random() < 0.05 THEN NULL ELSE base_date + (time_increment * (current_row + i)) + (random() * INTERVAL '1 hour' - INTERVAL '30 minutes') -- noise END, -- dt_real: monotiniacly increasing, correlation 0.999 CASE -- 99.5% records before cutoff_date WHEN current_row < total_rows * 0.995 THEN base_date + (time_increment * (current_row + i)) -- 0.5% records after cutoff_date (~60K records) ELSE cutoff_date + ((current_row + i - total_rows * 0.995) * INTERVAL '10 seconds') END FROM generate_series(0, LEAST(batch_size - 1, total_rows - current_row - 1)) AS i; current_row := current_row + batch_size; -- Show progress: IF current_row % 500000 = 0 THEN RAISE NOTICE 'Processed: % rows (%.1f%%)', current_row, (current_row::float / total_rows * 100); END IF; END LOOP; RAISE NOTICE 'Completed. Total rows inserted: %', current_row; END $$; -- Indexes: CREATE INDEX docum_dt ON docum(dt); CREATE INDEX docum_dt_real ON docum(dt_real); ALTER TABLE docum SET (autovacuum_enabled = true); -- Gather statistics: ALTER TABLE docum ALTER COLUMN dt SET STATISTICS 400; ALTER TABLE docum ALTER COLUMN dt_real SET STATISTICS 400; -- default_statistics_target = 400 will also work. VACUUM ANALYZE docum; -- The problematic query: EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING) SELECT MIN(docum.dt) AS "dt__min", MAX(docum.dt_real) AS "dt_real__max" FROM docum WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp; -- The plan I've got is: Result (cost=8.38..8.39 rows=1 width=16) (actual time=2660.034..2660.036 rows=1 loops=1) Output: (InitPlan 1).col1, (InitPlan 2).col1 Buffers: shared hit=9358751 read=30994 written=1 InitPlan 1 -> Limit (cost=0.43..7.91 rows=1 width=8) (actual time=2660.006..2660.007 rows=1 loops=1) Output: docum.dt Buffers: shared hit=9358747 read=30994 written=1 -> Index Scan using docum_dt_7ee1d676 on public.docum (cost=0.43..420487.43 rows=56222 width=8) (actual time=2660.004..2660.005 rows=1 loops=1) Output: docum.dt Index Cond: (docum.dt IS NOT NULL) Filter: (docum.dt_real >= '2025-08-14 09:44:09.033592'::timestamp without time zone) Rows Removed by Filter: 11342966 Buffers: shared hit=9358747 read=30994 written=1 InitPlan 2 -> Limit (cost=0.43..0.46 rows=1 width=8) (actual time=0.022..0.022 rows=1 loops=1) Output: docum_1.dt_real Buffers: shared hit=4 -> Index Only Scan Backward using docum_dt_real_2b81c58c on public.docum docum_1 (cost=0.43..1689.22 rows=59245 width=8) (actual time=0.021..0.021 rows=1 loops=1) Output: docum_1.dt_real Index Cond: (docum_1.dt_real >= '2025-08-14 09:44:09.033592'::timestamp without time zone) Heap Fetches: 0 Buffers: shared hit=4 Settings: work_mem = '16MB', search_path = 'public, public, "$user"' Planning: Buffers: shared hit=12 Planning Time: 0.148 ms Execution Time: 2660.056 ms This plan is not good: Execution Time: 2660.056 ms, But cost=8.38..8.39 only - looks too small! On this step: -> Index Scan using docum_dt_7ee1d676 on public.docum (cost=0.43..420487.43 ... Higher cost = 420487 looks adequate, but it was lost somehow and did not count into total query plan cost. -- There is a possible fix for query: just add count(*) column to SELECT: EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING) SELECT count(*) as cnt, MIN(docum.dt) AS "dt__min", MAX(docum.dt_real) AS "dt_real__max" FROM docum WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp; Aggregate (cost=2512.56..2512.57 rows=1 width=24) (actual time=10.179..10.180 rows=1 loops=1) Output: count(*), min(dt), max(dt_real) Buffers: shared hit=383 read=165 -> Index Scan using docum_dt_real_2b81c58c on public.docum (cost=0.43..2068.22 rows=59245 width=16) (actual time=0.013..6.430 rows=59999 loops=1) Output: id, dt, dt_real Index Cond: (docum.dt_real >= '2025-08-14 09:44:09.033592'::timestamp without time zone) Buffers: shared hit=383 read=165 Settings: work_mem = '16MB', search_path = 'public, public, "$user"' Planning: Buffers: shared hit=9 read=1 Planning Time: 0.927 ms Execution Time: 10.199 ms -- This plan cost = 2512 is much higher, but execution time is way better! Did You manage to reproduce this plan? Can I help You with more details?
Hi! Thank you for sharing this interesting case! On 01.09.2025 12:07, Алексей Борщёв wrote: > EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING) > SELECT > MIN(docum.dt) AS "dt__min", > MAX(docum.dt_real) AS "dt_real__max" > FROM docum > WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp; > -- The plan I've got is: > Result (cost=8.38..8.39 rows=1 width=16) (actual > time=2660.034..2660.036 rows=1 loops=1) > Output: (InitPlan 1).col1, (InitPlan 2).col1 > Buffers: shared hit=9358751 read=30994 written=1 > InitPlan 1 > -> Limit (cost=0.43..7.91 rows=1 width=8) (actual > time=2660.006..2660.007 rows=1 loops=1) > Output: docum.dt > Buffers: shared hit=9358747 read=30994 written=1 > -> Index Scan using docum_dt_7ee1d676 on public.docum > (cost=0.43..420487.43 rows=56222 width=8) (actual > time=2660.004..2660.005 rows=1 loops=1) > Output: docum.dt > Index Cond: (docum.dt IS NOT NULL) > Filter: (docum.dt_real >= '2025-08-14 > 09:44:09.033592'::timestamp without time zone) > Rows Removed by Filter: 11342966 > Buffers: shared hit=9358747 read=30994 written=1 > InitPlan 2 > -> Limit (cost=0.43..0.46 rows=1 width=8) (actual > time=0.022..0.022 rows=1 loops=1) > Output: docum_1.dt_real > Buffers: shared hit=4 > -> Index Only Scan Backward using docum_dt_real_2b81c58c on > public.docum docum_1 (cost=0.43..1689.22 rows=59245 width=8) (actual > time=0.021..0.021 rows=1 loops=1) > Output: docum_1.dt_real > Index Cond: (docum_1.dt_real >= '2025-08-14 > 09:44:09.033592'::timestamp without time zone) > Heap Fetches: 0 > Buffers: shared hit=4 > Settings: work_mem = '16MB', search_path = 'public, public, "$user"' > Planning: > Buffers: shared hit=12 > Planning Time: 0.148 ms > Execution Time: 2660.056 ms After disabling MIN/MAX optimization in the grouping_planner function: /* * Preprocess MIN/MAX aggregates, if any. Note: be careful about * adding logic between here and the query_planner() call. Anything * that is needed in MIN/MAX-optimizable cases will have to be * duplicated in planagg.c. */ //if (parse->hasAggs) // preprocess_minmax_aggregates(root); I got a better query plan, but I’m still investigating what went wrong. Also, creating a partial index helped because of the use of indexonlyscan: CREATE INDEX CONCURRENTLY docum_dt_recent ON docum (dt) WHERE dt_real >= timestamp '2025-08-01 00:00:00'; postgres=# EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING) SELECT MIN(docum.dt) AS "dt__min", MAX(docum.dt_real) AS "dt_real__max" FROM docum WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=1.11..1.12 rows=1 width=16) (actual time=206.206..206.207 rows=1.00 loops=1) Output: (InitPlan 1).col1, (InitPlan 2).col1 Buffers: shared hit=212507 InitPlan 1 -> Limit (cost=0.42..0.64 rows=1 width=8) (actual time=206.154..206.155 rows=1.00 loops=1) Output: docum.dt Buffers: shared hit=212503 -> Index Scan using docum_dt_recent on public.docum (cost=0.42..12391.88 rows=55710 width=8) (actual time=206.150..206.150 rows=1.00 loops=1) Output: docum.dt Index Cond: (docum.dt IS NOT NULL) Filter: (docum.dt_real >= '2025-08-14 09:44:09.033592'::timestamp without time zone) Rows Removed by Filter: 256799 Index Searches: 1 Buffers: shared hit=212503 InitPlan 2 -> Limit (cost=0.43..0.46 rows=1 width=8) (actual time=0.042..0.042 rows=1.00 loops=1) Output: docum_1.dt_real Buffers: shared hit=4 -> Index Only Scan Backward using docum_dt_real on public.docum docum_1 (cost=0.43..1671.62 rows=58696 width=8) (actual time=0.041..0.041 rows=1.00 loops=1) Output: docum_1.dt_real Index Cond: (docum_1.dt_real >= '2025-08-14 09:44:09.033592'::timestamp without time zone) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=4 Planning: Buffers: shared hit=30 read=10 Planning Time: 2.512 ms Execution Time: 206.290 ms (28 rows) -- Regards, Alena Rybakina Postgres Professional
On Tue, 2 Sept 2025 at 00:41, Alena Rybakina <a.rybakina@postgrespro.ru> wrote: > After disabling MIN/MAX optimization in the grouping_planner function: > /* > * Preprocess MIN/MAX aggregates, if any. Note: be careful about > * adding logic between here and the query_planner() call. > Anything > * that is needed in MIN/MAX-optimizable cases will have to be > * duplicated in planagg.c. > */ > //if (parse->hasAggs) > // preprocess_minmax_aggregates(root); > > I got a better query plan, but I’m still investigating what went wrong. This is basically just the standard issue people have with how we cost LIMIT. If you look at this part: -> Limit (cost=0.43..7.91 rows=1 width=8) Output: docum.dt -> Index Scan using docum_dt_7ee1d676 on public.docum (cost=0.43..420487.43 rows=56222 width=8) you can see that the Limit total cost is startup (0.43) plus 420487.43 / 56222 * 1. The problem is that the planner assumes there's no correlation between dt and dt_real. It thinks because a decent number of rows have dt_real >= '2025-08-14 09:44:09.033592' that it'll find the LIMIT 1 row fairly quickly by scanning the docum_dt_7ee1d676 index. Unfortunately, it's not quick because the rows matching dt_real >= '2025-08-14 09:44:09.033592' are nearly at the end of the index. In v16+, using something like MIN(docum.dt ORDER BY dt) AS "dt__min" should force the planner into giving a better plan. ORDER BY / DISTINCT aggregates are a bit less efficient in versions earlier than that, so doing that would require a bit of additional work if using v14. It might not be too bad, though. David