Thread

  1. Re: Function scan FDW pushdown

    Alexander Korotkov <aekorotkov@gmail.com> — 2026-05-11T12:02:24Z

    On Mon, May 11, 2026 at 9:01 AM solaimurugan vellaipandiyan <
    drsolaimurugan.v@gmail.com> wrote:
    > Thanks for the detailed explanation and the example queries. That
    > helped me better understand the costing behavior behind the pushdown
    > decisions.
    >
    > I tested the new v4 patch on current master using a postgres_fdw
    > loopback setup with local table t1 and foreign table ft1.
    >
    > Here’s what I observed:
    > - Before ANALYZE, the planner chose a local Hash Join with separate
    > Foreign Scan and Function  Scan nodes.
    > - After running ANALYZE on the empty table, it still preferred local
    > execution, which makes sense based on the costing explanation.
    > - I then inserted 1000 rows into t1, ran ANALYZE again, and repeated the
    test.
    >
    > Even after that, I still got a local Hash Join plan like:
    > Hash Join
    >  - Foreign Scan on ft1
    >  - Function Scan on generate_series
    > I wasn’t able to observe the pushed-down Foreign Scan plan shown in
    > the example from the thread.
    >
    > The patch itself applied and built successfully on my side, so this
    > may just be due to planner cost differences or environment-specific
    > behavior on current master.
    
    This also comes from the cost model.  Check this example.
    
    # CREATE TABLE t1 (id int);
    # INSERT INTO t1 SELECT g FROM generate_series(1, 1000) g;
    # CREATE FOREIGN TABLE ft1 (id int) SERVER loopback OPTIONS (table_name
    't1');
    # ANALYZE t1;
    # ANALYZE ft1;
    
    By default the local join is selected.
    
    # EXPLAIN (VERBOSE, COSTS ON)
    SELECT * FROM ft1, generate_series(1, 100) AS g(id)
    WHERE ft1.id = g.id;
                                              QUERY PLAN
    
    -----------------------------------------------------------------------------------------------
     Hash Join  (cost=102.25..332.00 rows=100 width=8)
       Output: ft1.id, g.id
       Hash Cond: (ft1.id = g.id)
       ->  Foreign Scan on public.ft1  (cost=100.00..325.00 rows=1000 width=4)
             Output: ft1.id
             Remote SQL: SELECT id FROM public.t1
       ->  Hash  (cost=1.00..1.00 rows=100 width=4)
             Output: g.id
             ->  Function Scan on pg_catalog.generate_series g
     (cost=0.00..1.00 rows=100 width=4)
                   Output: g.id
                   Function Call: generate_series(1, 100)
    (11 rows)
    
    However, we can force remote join using enable_* options. You can see it
    has higher cost.  This is because estimate_path_cost_size() expects join
    operator to be applied to the whole cross-product.
    
    # SET enable_hashjoin = off;
    # SET enable_mergejoin = off;
    # SET enable_nestloop = off;
    # EXPLAIN (VERBOSE, COSTS ON)
    SELECT * FROM ft1, generate_series(1, 100) AS g(id)
    WHERE ft1.id = g.id;
                                              QUERY PLAN
    
    -----------------------------------------------------------------------------------------------
     Hash Join  (cost=102.25..332.00 rows=100 width=8)
       Output: ft1.id, g.id
       Hash Cond: (ft1.id = g.id)
       ->  Foreign Scan on public.ft1  (cost=100.00..325.00 rows=1000 width=4)
             Output: ft1.id
             Remote SQL: SELECT id FROM public.t1
       ->  Hash  (cost=1.00..1.00 rows=100 width=4)
             Output: g.id
             ->  Function Scan on pg_catalog.generate_series g
     (cost=0.00..1.00 rows=100 width=4)
                   Output: g.id
                   Function Call: generate_series(1, 100)
    (11 rows)
    # RESET enable_hashjoin;
    # RESET enable_mergejoin;
    # RESET enable_nestloop;
    
    Also, this can be fixed using remote estimate.  You also can check that v3
    regression tests by Pyhalov use this approach as well.
    
    # ALTER FOREIGN TABLE ft1 OPTIONS (ADD use_remote_estimate 'true');
    # EXPLAIN (VERBOSE, COSTS ON)
    SELECT * FROM ft1, generate_series(1, 100) AS g(id)
    WHERE ft1.id = g.id;
                                                           QUERY PLAN
    
    ------------------------------------------------------------------------------------------------------------------------
     Foreign Scan  (cost=102.25..143.00 rows=100 width=8)
       Output: ft1.id, g.id
       Relations: (public.ft1) INNER JOIN (Function g)
       Remote SQL: SELECT r1.id, f2.c1 FROM (public.t1 r1 INNER JOIN
    generate_series(1, 100) f2(c1) ON (((r1.id = f2.c1))))
    (4 rows)
    
    Thus, I don't see it to be a problem of this specific patch.  I think this
    is general inaccuracy of postgres_fdw cost model.
    
    ------
    Regards,
    Alexander Korotkov
    Supabase