Thread

  1. Re: Limit memory usage by postgres_fdw batches

    Tomas Vondra <tomas@vondra.me> — 2025-12-27T13:18:56Z

    Hi Alexander,
    
    On 12/26/25 11:54, Alexander Pyhalov wrote:
    > Hi.
    > 
    > We had some real cases when client set rather big batch_size on server
    > level, but for some foreign table, containing large documents, it was
    > inadequate and lead to OOM killer intervention. You can argue that
    > batch_size can be set on foreign table level, but it can still be not
    > flexible enough, when tuple size varies.
    
    I agree this can be an issue with large tuples. It'd be good to consider
    the size, not just the number of tuples.
    
    I think you modified the right places, but I think there are two or
    three issues we should improve:
    
    1) It calls estimate_batch_length() for every ExecInsert() call, i.e.
    for every tuples. And it walks all tuples up to that point, which makes
    it O(N^2). I haven't measured how significant it is, but AFAICS we could
    track the current size of the batch fairly easily, and use that.
    
    2) work_mem is in kilobytes, while batch_len is in bytes, so the
    comparison (batch_len > work_mem) is not quite right. I'll probably fire
    every time, preventing any batching.
    
    3) Isn't this consider the size of the new tuple in batch_len? Imagine
    the tuples are 99% of the work_mem limit. We add the first one. When
    adding the next one we check the current batch is below work_mem, and so
    we proceed to add the second tuple. Now the batch is 1.98% of the limit.
    
    I think it should work like this:
    
    1) batch_len + tup_len < work_mem => add tuple to batch
    2) tup_len < work_mem => flush batch, add tuple to batch
    3) tup_len => work_mem => flush batch, insert tuple directly
    
    What bothers me a little bit is that this is per relation. AFAICS when
    inserting into a partitioned table with multiple foreign partitions,
    each partition will have a separate limit. I wonder if we could do
    better and have some sort of "global" limit for the whole insert.
    
    But that's not the fault of this patch, of course.
    
    > I suppose this case is also
    > takes place for fetch_size. Issue here is that we can't somehow limit
    > size of data (versus number of rows) while fetching from cursor. But we
    > can use tuple store to preserve fetched results, so that they spill out
    > to the disk.
    > 
    
    Perhaps. Seems like a separate issue. I haven't looked very closely, but
    do we want to use the tuplestore always, or just when the tuples get too
    large? It might even be on batch-by-batch, I guess. With fetch_size=1
    it's hardly useful, right? Is the tuplestore management measurable?
    
    > I'm attaching two patches which try to fix issues with possible huge
    > memory usage by postgres_fdw batches.
    > With fetched tuples we still can't use only tuplestore, as ctids are not
    > preserved, and so have to store them separately.
    > 
    > The reproducer for insert is simple.
    > 
    > create extension postgres_fdw ;
    > create server loopback foreign data wrapper postgres_fdw options (dbname
    > 'postgres', port '5432', batch_size '100', fetch_size '100');
    > create table base_table(i int, s bytea);
    > create foreign table foreign_table (i int, s bytea) server loopback
    > options(table_name 'base_table');
    > create user mapping for public server loopback ;
    > 
    > insert into foreign_table select i, pg_read_binary_file('/some/big/
    > file') from generate_series(1,1000) i;
    > 
    > will easily grow backend RSS to several gigabytes.
    > The first patch fixes this problem.
    > 
    > The second patch alleviates the second issue - SELECT * queries also can
    > grow backend memory to several GBs. Still memory usage can peak (on my
    > toy examples) up to 3-4 GB, but at least it seams 1-2 GB less than non-
    > patched version.
    > 
    
    How large are the tuples? How much higher was this RSS than the
    theoretical minimum?
    
    
    regards
    
    -- 
    Tomas Vondra