Re: BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset

Tom Lane <tgl@sss.pgh.pa.us>

From: Tom Lane <tgl@sss.pgh.pa.us>
To: feichanghong <feichanghong@qq.com>
Cc: ocean_li_996 <ocean_li_996@163.com>, "mohen.lhy@alibaba-inc.com" <mohen.lhy@alibaba-inc.com>, "pgsql-bugs@lists.postgresql.org" <pgsql-bugs@lists.postgresql.org>, jdavis@postgresql.org
Date: 2025-09-03T15:35:26Z
Lists: pgsql-bugs

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Eliminate duplicative hashtempcxt in nodeSubplan.c.

  2. Fix memory leakage in nodeSubplan.c.

  3. Do execGrouping.c via expression eval machinery, take two.

  4. Fix potential failure when hashing the output of a subplan that produces

feichanghong <feichanghong@qq.com> writes:
> It seems this issue has been around for many years. I took a quick
> look at the patch for fixing it. Why don't we reset the temp context
> in the LookupTupleHashEntry, TupleHashTableHash, LookupTupleHashEntryHash,
> and FindTupleHashEntry functions? This seems more robust.

No, I disagree with that.  MemoryContextReset is not free.  The
existing code seems to expect that the hash tempcxt will be a
per-tuple context or similar, which will be reset once per executor
cycle by existing mechanisms.  I wonder why ExecInitSubPlan is
making a separate context for this at all, rather than using some
surrounding short-lived context.

If we do keep a separate context, I agree with the idea of one
MemoryContextReset in the exit of ExecHashSubPlan, but the proposed
patch seems like a mess.  I think what we ought to do is refactor
ExecHashSubPlan so that there's exactly one "ExecClearTuple(slot)"
down at the bottom, and then we can add a MemoryContextReset after it.

> Furthermore,
> the added test case doesn't seem to detect whether there's a memory leak.

Yeah, test cases for memory leaks are problematic.  The only way they
can really "detect" one is if they run so long as to be pretty much
guaranteed to hit OOM, which is (a) impossible to quantify across
a range of platforms and (b) not something we'd care to commit anyway.

It's good if we have an example that one can watch to confirm
it-leaks-or-not by monitoring the process's memory consumption,
but I don't foresee committing it.

			regards, tom lane