Thread

Re: Add the ability to limit the amount of memory that can be allocated to backends.

Tomas Vondra <tomas@vondra.me> — 2025-12-27T14:52:21Z
Hi,

Let me bump this dormant thread after about a year. I don't have any
patch to share or anything like that, but I happened to read two quite
interesting papers relevant to the topic of setting a per-query memory
limit, and distributing the memory between operators in a query plan:

1) Exploiting pipeline interruptions for efficient memory allocation
   Authors: Josep Aguilar-Saborit, Mohammad Jalali, Dave Sharpe, Victor
   Muntés MuleroAuthors Info & Claims
   Published: 2008
   PDF: https://dl.acm.org/doi/epdf/10.1145/1458082.1458169

2) Saving Private Hash Join
   Authors: Laurens Kuiper, Paul Groß, Peter Boncz, Hannes Mühleisen
   Published: 2024-2025
   PDF: https://www.vldb.org/pvldb/vol18/p2748-kuiper.pdf

I read the (2) paper first, expecting to learn some new stuff about hash
joins, but to my surprise it's focusing on how to distribute memory
between multiple hash joins in a single query. The hash joins serve
mostly as an example of an actual/common operator in a query.

The (1) paper establishes the theoretical framework and algorithms, and
(2) presents a more precise/complete model for hash joins.

I think both papers are worth reading, and the framework seems quite
helpful. Even if there some parts may need tweaks, as Postgres does
certain things differently for whatever reason.

I'm not going to explain all the details (the papers are not that long
anyway), but the proposed approach combines a couple basic parts:


1) leverage "pipeline interruption" operations

Some operators materialize the intermediate results. This splits the
plan into parts that don't need memory at the same time. It's enough to
enforce the limit for these parts, not for the whole plan. Which means
the optimization problems are smaller, and the operators can get more
memory than when assuming the whole query runs at the same time.

Of course, the reality is more complicated. Some operators are only
partial pipeline interruptions (e.g. hashjoin breaks for the inner
subplan, not the outer).

And Postgres currently delays the Hash build until the first outer
tuple, unlike DuckDB used in the (2) paper.


2) cost model for memory distribution

The memory is distributed between operators based on a simple cost
model, instead of using a simple scheme where operators get 1/N of
memory. The (2) paper presents a hash join cost model combining
"materialization cost" and "throughput".

But it's a pretty generic approach, and should not be too difficult to
do for other operators. Most operators don't even need to allocate that
much memory, so the cost model can ignore those, I think. Only nodes
that "break the pipeline" would matter.

The important part is that this is a second optimization phase. The
optimizer picks a query plan (assuming some default memory sizes), and
then the cost model determines how to distribute memory within that
particular plan.

The paper does that repeatedly during execution, but I suspect these
adjustments are easier in DuckDB. In Postgres we'd probably do that only
once right after planning? Or maybe not, not sure.


The (1) paper does things a bit differently, so that it works on DB2 (so
less dynamic), but it also focuses on plans with hash joins etc. The
general framework seems to be mostly the same.


Anyway, that's all I have. I don't expect to work on this anytime soon,
but I found those papers interesting.


regards

-- 
Tomas Vondra