Обсуждение: BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure
BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 19345
Logged by: weijie JL
Email address: weijie1006jl@gmail.com
PostgreSQL version: 17.7
Operating system: RockyLinux8
Description:
Hello PostgreSQL Team,
After upgrading from PostgreSQL 14.11 to PostgreSQL 17.7 on our production
environment (RockyLinux8, RPM package), we encountered a crash when
executing a specific stored procedure.
This procedure ran successfully in 14.11 (approximately 8 hours execution
time), but on 17.7 it runs for about 6 hours before failing with:
LOCATION: MemoryContextSizeFailure, mcxt.c:1169
STATEMENT: CALL PR_MD()
The stack trace indicates calls related to hash join and temporary file
creation during query execution inside PL/pgSQL function pr_md_item_org()
and pr_md().
error log:
left join o_scm_dictionary_item Di ON Di.Dictionary_Type_Code =
'INV_0200304' AND Di.Is_Deleted = '0' AND Di.Item_Value = Lc.Option_Value
AND Msib.Lot_Control_Code = 2"
PL/pgSQL function pr_md_item_org() line 20 at SQL statement
SQL statement "CALL PR_MD_ITEM_ORG()"
PL/pgSQL function pr_md() line 72 at CALL
2025-11-28 06:02:22.121 CST 10.21.18.87(53748) pgdb_md md [3717604]LOCATION:
MemoryContextSizeFailure, mcxt.c:1169
2025-11-28 06:02:22.121 CST 10.21.18.87(53748) pgdb_md md
[3717604]BACKTRACE:
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x50d5e0]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x9c96dc]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(MemoryContextAllocZero+0x14) [0x9cfb54]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(ResourceOwnerEnlarge+0x9f) [0x9d218f]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(OpenTemporaryFile+0x64) [0x83dfc4]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(BufFileCreateTemp+0x18) [0x83a288]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(ExecHashJoinSaveTuple+0x68) [0x6e1ce8]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6e251f]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6c910c]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6f0d57]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6c910c]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6f0d57]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6c910c]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6e2487]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6c910c]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6eefa6]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6c910c]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(standard_ExecutorRun+0x17c) [0x6c27ec]
/usr/pgsql-17/lib/pg_stat_statements.so(+0x3ec5) [0x7f4dac51dec5]
/usr/pgsql-17/lib/auto_explain.so(+0x1928) [0x7f4daa9e8928]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6feb6e]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(SPI_execute_plan_with_paramlist+0xa6) [0x6ff326]
/usr/pgsql-17/lib/plpgsql.so(+0x12267) [0x7f1f6c910267]
/usr/pgsql-17/lib/plpgsql.so(+0x144ab) [0x7f1f6c9124ab]
/usr/pgsql-17/lib/plpgsql.so(+0x16583) [0x7f1f6c914583]
/usr/pgsql-17/lib/plpgsql.so(+0x1665b) [0x7f1f6c91465b]
/usr/pgsql-17/lib/plpgsql.so(+0x16e45) [0x7f1f6c914e45]
/usr/pgsql-17/lib/plpgsql.so(plpgsql_call_handler+0x1f3)
[0x7f1f6c91e473]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(ExecuteCallStmt+0x222) [0x669202]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(standard_ProcessUtility+0x715) [0x86ff45]
/usr/pgsql-17/lib/pg_stat_statements.so(+0x5dbd) [0x7f4dac51fdbd]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x6fe7fe]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(SPI_execute_plan_extended+0x5d) [0x6ff25d]
/usr/pgsql-17/lib/plpgsql.so(+0x13d10) [0x7f1f6c911d10]
/usr/pgsql-17/lib/plpgsql.so(+0x16583) [0x7f1f6c914583]
/usr/pgsql-17/lib/plpgsql.so(+0x1665b) [0x7f1f6c91465b]
/usr/pgsql-17/lib/plpgsql.so(+0x16e45) [0x7f1f6c914e45]
/usr/pgsql-17/lib/plpgsql.so(plpgsql_call_handler+0x1f3)
[0x7f1f6c91e473]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(ExecuteCallStmt+0x222) [0x669202]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(standard_ProcessUtility+0x715) [0x86ff45]
/usr/pgsql-17/lib/pg_stat_statements.so(+0x5dbd) [0x7f4dac51fdbd]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x86e4cf]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x86e603]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(PortalRun+0x27b) [0x86eb5b]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(PostgresMain+0x16ec) [0x86c7bc]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x8672b5]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(postmaster_child_launch+0x8f) [0x7d850f]
postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
[0x7dbe5c]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(PostmasterMain+0xd18) [0x7dda78]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(main+0x223) [0x50f843]
/lib64/libc.so.6(__libc_start_main+0xe5) [0x7f4dbe4fe7e5]
postgres: postgres003: md pgdb_md 10.21.18.87(53748)
CALL(_start+0x2e) [0x50fe1e]
2025-11-28 06:02:22.121 CST 10.21.18.87(53748) pgdb_md md
[3717604]STATEMENT: CALL PR_MD()
Observations / Workarounds Tested:
In 14.11: procedure runs successfully (~8h). In 17.7: fails after ~6h.
SET enable_hashjoin = off; → procedure runs successfully.
SET hash_mem_multiplier = 1; → procedure runs successfully.
Best regards,
Jeslie
Re: BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure
От
Kirill Reshke
Дата:
On Fri, 5 Dec 2025 at 15:38, PG Bug reporting form <noreply@postgresql.org> wrote: > SET hash_mem_multiplier = 1; → procedure runs successfully. I think this is because of https://git.postgresql.org/cgit/postgresql.git/commit/?id=8f388f6f554b113f25a53fe3237238d2c58ed1eb We have this is release notes https://postgr.es/c/8f388f6f5 So, after upgrading to 17 (and thus, having this change), defaults changed, some queries now need adjustment... I am not sure we have fixable problem in HEAD here -- Best regards, Kirill Reshke
PG Bug reporting form <noreply@postgresql.org> writes:
> After upgrading from PostgreSQL 14.11 to PostgreSQL 17.7 on our production
> environment (RockyLinux8, RPM package), we encountered a crash when
> executing a specific stored procedure.
> This procedure ran successfully in 14.11 (approximately 8 hours execution
> time), but on 17.7 it runs for about 6 hours before failing with:
> LOCATION: MemoryContextSizeFailure, mcxt.c:1169
> [3717604]BACKTRACE:
> postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
> [0x50d5e0]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL()
> [0x9c96dc]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(MemoryContextAllocZero+0x14) [0x9cfb54]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(ResourceOwnerEnlarge+0x9f) [0x9d218f]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(OpenTemporaryFile+0x64) [0x83dfc4]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(BufFileCreateTemp+0x18) [0x83a288]
> postgres: postgres003: md pgdb_md 10.21.18.87(53748)
> CALL(ExecHashJoinSaveTuple+0x68) [0x6e1ce8]
Hmm. Apparently, your hash join tried to use so many temporary files
that it needed a more-than-1GB array just to track them all. One could
guess that it had been in swap hell for some hours before reaching
this point, because that'd imply about a terabyte worth of I/O
buffers, never mind the actual hashtable data.
Too-many-batches (resulting in too-many-temp-files) is a known failure
mechanism for our hash join code. We've tried to improve that in v18
(cf commits a1b4f289b, aa151022e), but I don't think there's any
appetite for back-patching that work into older branches.
> Observations / Workarounds Tested:
> SET enable_hashjoin = off; → procedure runs successfully.
> SET hash_mem_multiplier = 1; → procedure runs successfully.
I'm curious whether the change of hash_mem_multiplier causes it
to not use a hash join, or that is just successful at limiting
the hash table growth to a point short of failure.
regards, tom lane
I found this information in the log: 2025-12-09 09:01:43.384 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892120",size 1073741824 2025-12-09 09:01:43.384 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.451 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892121",size 1073741824 2025-12-09 09:01:43.451 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.518 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892122",size 1073741824 2025-12-09 09:01:43.518 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.604 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892123",size 1073741824 2025-12-09 09:01:43.604 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.672 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892124",size 1073741824 2025-12-09 09:01:43.672 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.737 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892125",size 1073741824 2025-12-09 09:01:43.737 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.806 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892126",size 1073741824 2025-12-09 09:01:43.806 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.874 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892127",size 1073741824 2025-12-09 09:01:43.874 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:43.940 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892128",size 1073741824 2025-12-09 09:01:43.940 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:44.007 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892129",size 1073741824 2025-12-09 09:01:44.007 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:44.073 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892130",size 1073741824 2025-12-09 09:01:44.073 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:44.140 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892131",size 1073741824 2025-12-09 09:01:44.140 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:44.207 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892132",size 1073741824 2025-12-09 09:01:44.207 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:44.272 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892133",size 1073741824 2025-12-09 09:01:44.272 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:44.340 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892134",size 1073741824 2025-12-09 09:01:44.340 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:44.406 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892135",size 1069645824 2025-12-09 09:01:44.406 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:01:46.030 CST 10.10.17.192(44780) pgdb_md md [467056]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp467056.0",size 356745216 2025-12-09 09:12:24.606 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:12:24.674 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892137",size 1073741824 2025-12-09 09:12:24.674 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:12:24.747 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892138",size 1073741824 2025-12-09 09:12:24.747 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:12:24.814 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892139",size 1073741824 2025-12-09 09:12:24.814 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:12:27.941 CST 10.10.17.192(46184) pgdb_md md [479218]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp479218.4",size 356745216 2025-12-09 09:41:05.279 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:41:24.385 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 09:41:32.003 CST 10.10.17.192(51364) pgdb_md md [518956]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp518956.10",size 356761600 2025-12-09 10:23:18.997 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:23:19.043 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892166",size 392716288 2025-12-09 10:23:19.043 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:23:19.089 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892146",size 571425269 2025-12-09 10:23:19.089 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:23:19.159 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892149",size 1073741824 2025-12-09 10:23:19.159 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:23:19.222 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892148",size 965312512 2025-12-09 10:23:19.222 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:23:19.308 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892145",size 1073741824 2025-12-09 10:23:19.308 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:23:19.350 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp269407.25892150",size 369754112 2025-12-09 10:23:19.350 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:23:21.524 CST 10.10.17.192(58298) pgdb_md md [571449]LOG: 00000: temporary file: path "base/pgsql_tmp/pgsql_tmp571449.9",size 356769792 2025-12-09 10:25:02.481 CST 10.21.18.87(44346) pgdb_md md [269407]STATEMENT: CALL PR_MD() 2025-12-09 10:47:41.771 CST 10.21.18.87(44346) pgdb_md md [269407]LOG: 00000: duration: 35252734.115 ms execute <unnamed>:CALL PR_MD() 2025-12-09 10:47:41.771 CST 10.21.18.87(44346) pgdb_md md [269407]LOCATION: exec_execute_message, postgres.c:2348 > 2025年12月7日 05:04,Tom Lane <tgl@sss.pgh.pa.us> 写道: > > PG Bug reporting form <noreply@postgresql.org> writes: >> After upgrading from PostgreSQL 14.11 to PostgreSQL 17.7 on our production >> environment (RockyLinux8, RPM package), we encountered a crash when >> executing a specific stored procedure. > >> This procedure ran successfully in 14.11 (approximately 8 hours execution >> time), but on 17.7 it runs for about 6 hours before failing with: >> LOCATION: MemoryContextSizeFailure, mcxt.c:1169 >> [3717604]BACKTRACE: >> postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL() >> [0x50d5e0] >> postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL() >> [0x9c96dc] >> postgres: postgres003: md pgdb_md 10.21.18.87(53748) >> CALL(MemoryContextAllocZero+0x14) [0x9cfb54] >> postgres: postgres003: md pgdb_md 10.21.18.87(53748) >> CALL(ResourceOwnerEnlarge+0x9f) [0x9d218f] >> postgres: postgres003: md pgdb_md 10.21.18.87(53748) >> CALL(OpenTemporaryFile+0x64) [0x83dfc4] >> postgres: postgres003: md pgdb_md 10.21.18.87(53748) >> CALL(BufFileCreateTemp+0x18) [0x83a288] >> postgres: postgres003: md pgdb_md 10.21.18.87(53748) >> CALL(ExecHashJoinSaveTuple+0x68) [0x6e1ce8] > > Hmm. Apparently, your hash join tried to use so many temporary files > that it needed a more-than-1GB array just to track them all. One could > guess that it had been in swap hell for some hours before reaching > this point, because that'd imply about a terabyte worth of I/O > buffers, never mind the actual hashtable data. > > Too-many-batches (resulting in too-many-temp-files) is a known failure > mechanism for our hash join code. We've tried to improve that in v18 > (cf commits a1b4f289b, aa151022e), but I don't think there's any > appetite for back-patching that work into older branches. > >> Observations / Workarounds Tested: >> SET enable_hashjoin = off; → procedure runs successfully. >> SET hash_mem_multiplier = 1; → procedure runs successfully. > > I'm curious whether the change of hash_mem_multiplier causes it > to not use a hash join, or that is just successful at limiting > the hash table growth to a point short of failure. > > regards, tom lane