Обсуждение: BUG #19332: Sudden 330x performance degradation of SELECT amid INSERTs
BUG #19332: Sudden 330x performance degradation of SELECT amid INSERTs
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 19332
Logged by: Sergey Naumov
Email address: sknaumov@gmail.com
PostgreSQL version: 18.1
Operating system: Linux (official docker image)
Description:
I'm using Django to write to PostgreSQL and implemented data generator to
generate fake data to assess performance of queries.
Everything was good when I used initial data generator that issued only
INSERT statements + SELECT statements to quite small table.
But then I had to extend it with logic to perform SELECT of just generated
low-level entries (TestRun) to create related entries (1-to-1) in another
table:
TestRun.objects.select_related('test').filter(regression__run=run)
Generator starts to insert data with decent speed, something like 50
high-level entries (Run) per second, in a single thread, with Python taking
~85% of one CPU core, and PostgreSQL - remaining 15%. Then suddenly (at
random, doesn't depend on amount of data present in DB), performance
degrades to 1-3 high level entries per second, PostgreSQL takes ~96-99% of
CPU core and Python - remaining 1-4%.
I do not see any changes in queries issued by Django - they are just issued
in a loop with a bit different random data. If I comment out SELECT query to
low-level tables, the issue disappears.
20%|██████████████████████████▌
| 2010/10000 [00:48<03:06, 42.74it/s]
(0.001) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521740769275...
(0.001) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452174076927509, 1),
(340282366920938463465138452174076927509, 4), (3402823669209384...
(0.000) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217407...
(0.001) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452174076927509, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.030) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (16, 2011, 'Pass', 6677, 1), (113, 2011, 'Pass',
9545, 1), (195, 2011, 'Pass...
(0.001) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.002) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (247828,
'{"score": 6.952412837213508}'::jsonb), (247829, '{"execution_time_sec":
3440020.606986488}'::jsonb...
(0.000) INSERT INTO "report" ("id", "ts", "report_path", "table_name",
"type", "source") VALUES (340282366920938463465138452174130059518,
'2025-09-04 08:33:16+00:00'::timesta...
...
(0.000) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521741636876...
(0.000) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452174163687660, 1),
(340282366920938463465138452174163687660, 5), (3402823669209384...
(0.001) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217416...
(0.000) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452174163687660, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.002) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (105, 2014, 'Pass', 667, 1), (85, 2014, 'Pass',
2196, 1), (112, 2014, 'Pass'...
(0.001) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.001) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (248094,
'{"score": 1.5836472962691261}'::jsonb), (248095, '{"score":
2.820649074490209}'::jsonb), (248096, ...
(0.000) INSERT INTO "report" ("id", "ts", "report_path", "table_name",
"type", "source") VALUES (340282366920938463465138452174182236095,
'2025-11-14 08:13:19+00:00'::timesta...
(0.000) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521741822360...
(0.000) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452174182236095, 11),
(340282366920938463465138452174182236095, 2), (340282366920938...
(0.000) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217418...
(0.000) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452174182236095, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.002) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (217, 2015, 'Pass', 1886, 1), (11, 2015, 'Pass',
4893, 1), (173, 2015, 'Pass...
(0.336) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.001) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (248190,
'{"score": 1.186662265710201}'::jsonb), (248211, '{"score":
6.662192055037965}'::jsonb), (248199, '...
20%|██████████████████████████▌
| 2015/10000 [00:49<05:44, 23.18it/s]
...
(0.000) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521766038035...
(0.000) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452176603803511, 2),
(340282366920938463465138452176603803511, 10), (340282366920938...
(0.000) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217660...
(0.001) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452176603803511, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.003) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (187, 2022, 'Pass', 5664, 1), (242, 2022,
'Pass', 8810, 1), (205, 2022, 'Pas...
(0.329) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.002) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (249104,
'{"score": 13.311529074285398}'::jsonb), (249020, '{"execution_time_sec":
2229116.6002634834}'::jso...
20%|██████████████████████████▋
| 2022/10000 [00:51<22:14, 5.98it/s]
Note that runtime of SELECT query suddenly increased ~330 times. As I said,
it happens randomly, even almost at the very start of data generation on
empty DB - in the case above it happens momentarily between 2014/2015 Run
entries generation, but in general could happen even at 100th entry.
This behavior is visible on both PostgreSQL versions I've tried - 16.10 and
18.1.
It looks like that the problem is in autovacuum - first I realized that I can't reproduce the problem if I do not delete previous data before generating the new one.
Then I've turned off autovacuum and failed to reproduce the problem even with data clean up.
But I think that the bug is still relevant - 330x performance degradation due to maintenance background task that takes minutes (it looks like inserts/selects affect autovacuum performance too and it takes a lot of time to complete) in not that user expects from DB.
вс, 23 нояб. 2025 г., 11:33 PG Bug reporting form <noreply@postgresql.org>:
The following bug has been logged on the website:
Bug reference: 19332
Logged by: Sergey Naumov
Email address: sknaumov@gmail.com
PostgreSQL version: 18.1
Operating system: Linux (official docker image)
Description:
I'm using Django to write to PostgreSQL and implemented data generator to
generate fake data to assess performance of queries.
Everything was good when I used initial data generator that issued only
INSERT statements + SELECT statements to quite small table.
But then I had to extend it with logic to perform SELECT of just generated
low-level entries (TestRun) to create related entries (1-to-1) in another
table:
TestRun.objects.select_related('test').filter(regression__run=run)
Generator starts to insert data with decent speed, something like 50
high-level entries (Run) per second, in a single thread, with Python taking
~85% of one CPU core, and PostgreSQL - remaining 15%. Then suddenly (at
random, doesn't depend on amount of data present in DB), performance
degrades to 1-3 high level entries per second, PostgreSQL takes ~96-99% of
CPU core and Python - remaining 1-4%.
I do not see any changes in queries issued by Django - they are just issued
in a loop with a bit different random data. If I comment out SELECT query to
low-level tables, the issue disappears.
20%|██████████████████████████▌
| 2010/10000 [00:48<03:06, 42.74it/s]
(0.001) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521740769275...
(0.001) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452174076927509, 1),
(340282366920938463465138452174076927509, 4), (3402823669209384...
(0.000) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217407...
(0.001) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452174076927509, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.030) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (16, 2011, 'Pass', 6677, 1), (113, 2011, 'Pass',
9545, 1), (195, 2011, 'Pass...
(0.001) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.002) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (247828,
'{"score": 6.952412837213508}'::jsonb), (247829, '{"execution_time_sec":
3440020.606986488}'::jsonb...
(0.000) INSERT INTO "report" ("id", "ts", "report_path", "table_name",
"type", "source") VALUES (340282366920938463465138452174130059518,
'2025-09-04 08:33:16+00:00'::timesta...
...
(0.000) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521741636876...
(0.000) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452174163687660, 1),
(340282366920938463465138452174163687660, 5), (3402823669209384...
(0.001) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217416...
(0.000) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452174163687660, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.002) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (105, 2014, 'Pass', 667, 1), (85, 2014, 'Pass',
2196, 1), (112, 2014, 'Pass'...
(0.001) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.001) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (248094,
'{"score": 1.5836472962691261}'::jsonb), (248095, '{"score":
2.820649074490209}'::jsonb), (248096, ...
(0.000) INSERT INTO "report" ("id", "ts", "report_path", "table_name",
"type", "source") VALUES (340282366920938463465138452174182236095,
'2025-11-14 08:13:19+00:00'::timesta...
(0.000) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521741822360...
(0.000) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452174182236095, 11),
(340282366920938463465138452174182236095, 2), (340282366920938...
(0.000) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217418...
(0.000) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452174182236095, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.002) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (217, 2015, 'Pass', 1886, 1), (11, 2015, 'Pass',
4893, 1), (173, 2015, 'Pass...
(0.336) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.001) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (248190,
'{"score": 1.186662265710201}'::jsonb), (248211, '{"score":
6.662192055037965}'::jsonb), (248199, '...
20%|██████████████████████████▌
| 2015/10000 [00:49<05:44, 23.18it/s]
...
(0.000) SELECT "tag"."id" FROM "tag" INNER JOIN "report_tags" ON ("tag"."id"
= "report_tags"."tag_id") WHERE "report_tags"."report_id" =
3402823669209384634651384521766038035...
(0.000) INSERT INTO "report_tags" ("report_id", "tag_id") VALUES
(340282366920938463465138452176603803511, 2),
(340282366920938463465138452176603803511, 10), (340282366920938...
(0.000) INSERT INTO "run" ("report_id", "wall_time", "rtl_repo",
"rtl_commit", "tests_repo", "tests_commit", "scenario", "simulator") VALUES
(34028236692093846346513845217660...
(0.001) INSERT INTO "regression" ("run_id", "config_name") VALUES
(340282366920938463465138452176603803511, NULL) RETURNING "regression"."id";
args=(Decimal('3402823669209384...
(0.003) INSERT INTO "test_run" ("test_id", "regression_id", "status",
"wall_time", "seed") VALUES (187, 2022, 'Pass', 5664, 1), (242, 2022,
'Pass', 8810, 1), (205, 2022, 'Pas...
(0.329) SELECT "test_run"."id", "test_run"."test_id",
"test_run"."regression_id", "test_run"."status", "test_run"."wall_time",
"test_run"."seed", "test"."id", "test"."name" F...
(0.002) INSERT INTO "bench_run" ("test_run_id", "metrics") VALUES (249104,
'{"score": 13.311529074285398}'::jsonb), (249020, '{"execution_time_sec":
2229116.6002634834}'::jso...
20%|██████████████████████████▋
| 2022/10000 [00:51<22:14, 5.98it/s]
Note that runtime of SELECT query suddenly increased ~330 times. As I said,
it happens randomly, even almost at the very start of data generation on
empty DB - in the case above it happens momentarily between 2014/2015 Run
entries generation, but in general could happen even at 100th entry.
This behavior is visible on both PostgreSQL versions I've tried - 16.10 and
18.1.
On Mon, 24 Nov 2025 at 00:39, Sergey Naumov <sknaumov@gmail.com> wrote: > But I think that the bug is still relevant - 330x performance degradation due to maintenance background task that takesminutes (it looks like inserts/selects affect autovacuum performance too and it takes a lot of time to complete) innot that user expects from DB. The planner not picking the most optimal plan all of the time doesn't constitute a bug. We provide many ways, including extended statistics, n_distinct estimate overrides, increasing stats targets and various GUC to allow users to influence the planner when it doesn't choose the best plan. For your case, it sounds like autovacuum might be running an ANALYZE and causing the planner to change plans, of which the newly chosen plan appears to execute more slowly than the previous plan. To fix this, something from the above list may just need to be addressed. I suggest you look at the EXPLAIN ANALYZE output for the slow query from before and after the slowdown. If you need help with figuring out why the planner picked a bad plan and ways to resolve that, then the performance mailing list might be a better place. If you happen to have discovered any sort of bug, then it can be reported here. David
Hello again.
I've collected slow and fast query plans and it looks like when data is cleaned up, PostgreSQL doesn't know what table is big and what is small, and when data generation is in one big transaction, data from this uncommitted transaction already affects SELECT queries, but VACUUM doesn't see this uncommitted data to adjust stats => query planner could come up with suboptimal query plan.
But still, the query itself has a hint as to what table has to be filtered first - there is a WHERE clause to keep just one line from this table. But query planner decides to join another (very big) table first => performance degrades by orders of magnitude.
For me It looks like a flaw in query planner logic, that, having no data about tables content, ignores the WHERE clause that hints what table has to be processed first => not sure whether it should be treated as performance issue or bug.
Query plans are attached as PEV2 standalone HTML pages.
Thanks,
Sergey.
Вложения
On Thu, 4 Dec 2025 at 23:20, Sergey Naumov <sknaumov@gmail.com> wrote: > I've collected slow and fast query plans and it looks like when data is cleaned up, PostgreSQL doesn't know what tableis big and what is small, and when data generation is in one big transaction, data from this uncommitted transactionalready affects SELECT queries, but VACUUM doesn't see this uncommitted data to adjust stats => query plannercould come up with suboptimal query plan. > > But still, the query itself has a hint as to what table has to be filtered first - there is a WHERE clause to keep justone line from this table. But query planner decides to join another (very big) table first => performance degrades byorders of magnitude. > > For me It looks like a flaw in query planner logic, that, having no data about tables content, ignores the WHERE clausethat hints what table has to be processed first => not sure whether it should be treated as performance issue or bug. I've studied this for a bit and pretty much feel like I've got one hand tied behind my back due to the format you've submitted the EXPLAIN in. I don't really understand the visual format as details I'm used to seeing are not there, and the JSON "raw" format isn't much better for humans to read, as it doesn't fit on a page. From what I see, it looks like the slow version opts to build the hash table for the hash join from an Index Scan on an index on test_run, but that index is bloated. For the fast plan, the bloated index isn't scanned because that same index is on the inner side of a nested loop and the outer side didn't find any ways, resulting in the inner side never being executed. From looking at get_relation_info(), I see we do fetch the actual size of the index with "info->pages = RelationGetNumberOfBlocks(indexRelation);", and from looking at genericcostestimate(), we should estimate the number of pages with "numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);", so I think if the index did contain the 74962 pages when the query was planned, then the costs should have known about it. Are you using prepared statements here? (I suspect not, since the UUID is in the EXPLAIN output, but you could have hardcoded that) > Query plans are attached as PEV2 standalone HTML pages. I imagine some people might like this format, but I can bearly read it. You'll probably have more luck around here with EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) And for me, I've not really seen enough evidence that there's any bug here. If you think there is, then you might need to work a bit harder and provide a script that we can reproduce this with ourselves. David
> so I think if the index did contain the 74962 pages when the query was planned, then the costs should have known about it.
Yes, here we are facing a corner case when data is generated in a long transaction on just cleaned-up DB => when autovacuum kicks in, autoanalyze wipes old stats and have no data in tables to properly assess relations cardinality.
> And for me, I've not really seen enough evidence that there's any bug here.
So the question is whether the query planner should take into account the WHERE clause that hints that it is enough to join just a single row instead of the whole table.
ср, 10 дек. 2025 г. в 06:46, David Rowley <dgrowleyml@gmail.com>:
On Thu, 4 Dec 2025 at 23:20, Sergey Naumov <sknaumov@gmail.com> wrote:
> I've collected slow and fast query plans and it looks like when data is cleaned up, PostgreSQL doesn't know what table is big and what is small, and when data generation is in one big transaction, data from this uncommitted transaction already affects SELECT queries, but VACUUM doesn't see this uncommitted data to adjust stats => query planner could come up with suboptimal query plan.
>
> But still, the query itself has a hint as to what table has to be filtered first - there is a WHERE clause to keep just one line from this table. But query planner decides to join another (very big) table first => performance degrades by orders of magnitude.
>
> For me It looks like a flaw in query planner logic, that, having no data about tables content, ignores the WHERE clause that hints what table has to be processed first => not sure whether it should be treated as performance issue or bug.
I've studied this for a bit and pretty much feel like I've got one
hand tied behind my back due to the format you've submitted the
EXPLAIN in. I don't really understand the visual format as details I'm
used to seeing are not there, and the JSON "raw" format isn't much
better for humans to read, as it doesn't fit on a page.
From what I see, it looks like the slow version opts to build the hash
table for the hash join from an Index Scan on an index on test_run,
but that index is bloated.
For the fast plan, the bloated index isn't scanned because that same
index is on the inner side of a nested loop and the outer side didn't
find any ways, resulting in the inner side never being executed. From
looking at get_relation_info(), I see we do fetch the actual size of
the index with "info->pages =
RelationGetNumberOfBlocks(indexRelation);", and from looking at
genericcostestimate(), we should estimate the number of pages with
"numIndexPages = ceil(numIndexTuples * index->pages /
index->tuples);", so I think if the index did contain the 74962 pages
when the query was planned, then the costs should have known about it.
Are you using prepared statements here? (I suspect not, since the UUID
is in the EXPLAIN output, but you could have hardcoded that)
> Query plans are attached as PEV2 standalone HTML pages.
I imagine some people might like this format, but I can bearly read
it. You'll probably have more luck around here with EXPLAIN (ANALYZE,
BUFFERS, FORMAT TEXT)
And for me, I've not really seen enough evidence that there's any bug
here. If you think there is, then you might need to work a bit harder
and provide a script that we can reproduce this with ourselves.
David
On Wed, 10 Dec 2025 at 23:33, Sergey Naumov <sknaumov@gmail.com> wrote: > > > so I think if the index did contain the 74962 pages when the query was planned, then the costs should have known aboutit. > Yes, here we are facing a corner case when data is generated in a long transaction on just cleaned-up DB => when autovacuumkicks in, autoanalyze wipes old stats and have no data in tables to properly assess relations cardinality. > > > And for me, I've not really seen enough evidence that there's any bug here. > So the question is whether the query planner should take into account the WHERE clause that hints that it is enough tojoin just a single row instead of the whole table. Just for the record here, the planner has no preference with join order in regards to existance of WHERE clause. The primary driver of this is the estimated number of rows. From that, the exact costs for the specific join method are based on a few other things too. It sounds like you're claiming that it should be better to assume the scan with the WHERE clause is better somehow. To me, this sounds like something a rules-based optimiser might do. We have a cost-based optimiser which uses table statistics as inputs to the cost calculations. I did try and recreate the issue you've reported, but I'm unable to. I find it a bit suspicious that your planner opted to Hash Join when the hash table was estimated to contain a single row. I'd expect the planner would normally Nested Loop unless there's a FULL JOIN, which there is not, in this case. See attached. I expect you'll have more luck with bug reports if you work with the person who's trying to help you and try and gather the information they've requested rather than ignoring that and reiterating what you think the problem is. David