Thread
-
Re: BUG #19345: MemoryContextSizeFailure after upgrade 14.11 to 17.7 in stored procedure
Tom Lane <tgl@sss.pgh.pa.us> — 2025-12-06T21:04:48Z
PG Bug reporting form <noreply@postgresql.org> writes: > After upgrading from PostgreSQL 14.11 to PostgreSQL 17.7 on our production > environment (RockyLinux8, RPM package), we encountered a crash when > executing a specific stored procedure. > This procedure ran successfully in 14.11 (approximately 8 hours execution > time), but on 17.7 it runs for about 6 hours before failing with: > LOCATION: MemoryContextSizeFailure, mcxt.c:1169 > [3717604]BACKTRACE: > postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL() > [0x50d5e0] > postgres: postgres003: md pgdb_md 10.21.18.87(53748) CALL() > [0x9c96dc] > postgres: postgres003: md pgdb_md 10.21.18.87(53748) > CALL(MemoryContextAllocZero+0x14) [0x9cfb54] > postgres: postgres003: md pgdb_md 10.21.18.87(53748) > CALL(ResourceOwnerEnlarge+0x9f) [0x9d218f] > postgres: postgres003: md pgdb_md 10.21.18.87(53748) > CALL(OpenTemporaryFile+0x64) [0x83dfc4] > postgres: postgres003: md pgdb_md 10.21.18.87(53748) > CALL(BufFileCreateTemp+0x18) [0x83a288] > postgres: postgres003: md pgdb_md 10.21.18.87(53748) > CALL(ExecHashJoinSaveTuple+0x68) [0x6e1ce8] Hmm. Apparently, your hash join tried to use so many temporary files that it needed a more-than-1GB array just to track them all. One could guess that it had been in swap hell for some hours before reaching this point, because that'd imply about a terabyte worth of I/O buffers, never mind the actual hashtable data. Too-many-batches (resulting in too-many-temp-files) is a known failure mechanism for our hash join code. We've tried to improve that in v18 (cf commits a1b4f289b, aa151022e), but I don't think there's any appetite for back-patching that work into older branches. > Observations / Workarounds Tested: > SET enable_hashjoin = off; → procedure runs successfully. > SET hash_mem_multiplier = 1; → procedure runs successfully. I'm curious whether the change of hash_mem_multiplier causes it to not use a hash join, or that is just successful at limiting the hash table growth to a point short of failure. regards, tom lane