Thread

  1. Re: Fix race condition in pg_get_publication_tables with concurrent DROP TABLE

    shveta malik <shveta.malik@gmail.com> — 2026-05-05T04:02:00Z

    On Tue, May 5, 2026 at 8:40 AM Chao Li <li.evan.chao@gmail.com> wrote:
    >
    >
    >
    > > On May 4, 2026, at 13:08, shveta malik <shveta.malik@gmail.com> wrote:
    > >
    > > On Fri, May 1, 2026 at 7:00 AM Bharath Rupireddy
    > > <bharath.rupireddyforpostgres@gmail.com> wrote:
    > >>
    > >> Hi,
    > >>
    > >> On Wed, Apr 29, 2026 at 8:41 PM Chao Li <li.evan.chao@gmail.com> wrote:
    > >>>
    > >>>> <v5-0001-PG16-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-Fix-pg_get_publication_tables-race-with-concurren.patch><v5-0001-PG18-Fix-pg_get_publication_tables-race-with-conc.txt><v5-0001-PG17-Fix-pg_get_publication_tables-race-with-conc.txt>
    > >>>
    > >>> I am afraid this is only a partial fix.
    > >>
    > >> Thanks for reviewing it. Please find my responses below.
    > >>
    > >>> ```
    > >>> @@ -1599,12 +1621,18 @@ pg_get_publication_tables(FunctionCallInfo fcinfo, ArrayType *pubnames,
    > >>>                /* Show all columns when the column list is not specified. */
    > >>>                if (nulls[2])
    > >>>                {
    > >>> -                       Relation        rel = table_open(relid, AccessShareLock);
    > >>> +                       Relation        rel = try_table_open(relid, AccessShareLock);
    > >>>                        int                     nattnums = 0;
    > >>>                        int16      *attnums;
    > >>> -                       TupleDesc       desc = RelationGetDescr(rel);
    > >>> +                       TupleDesc       desc;
    > >>>                        int                     i;
    > >>>
    > >>> +                       /* Skip if the relation has been concurrently dropped. */
    > >>> +                       if (rel == NULL)
    > >>> +                               continue;
    > >>> ```
    > >>>
    > >>> This change uses try_table_open() to detect whether a table has been dropped, but try_table_open() is only called when the column list is not specified. If a table is included in the publication with an explicit column list, then even if it is dropped concurrently, it may still be returned by pg_get_publication_tables().
    > >>
    > >> Right. The try_table_open() is only needed there because that's the
    > >> only code path that actually opens the relation (to enumerate its
    > >> columns). The column list path reads from the pg_publication_rel
    > >> catalog - it never calls table_open(), so it cannot hit the ERROR.
    > >>
    > >>> So this patch removes the “could not open relation with OID” error, but it does not fully ensure the accuracy of the returned table list.
    > >>
    > >> IMO, no function returning table OIDs can guarantee they remain valid
    > >> - a drop can happen right after we return the OID, and accuracy is in
    > >> the caller's hands. All the callers of pg_get_publication_tables()
    > >> already handle this by JOINing with pg_class.
    > >>
    > >
    > > I agree with Bharath. Also I would like to add one more point. We do have this:
    > >
    > > + /* Skip if the relation has been concurrently dropped. */
    > > + if (!OidIsValid(schemaid))
    > > + continue;
    > >
    >
    > Actually, this is the other comment I have. Why the comment says “if the relation has been dropped”, but the actual check is on schema id?
    >
    
    Okay, I see your point. Shall we tweak it to the below to make it more
    understandable?
    
    
    Oid schemaid;
    
    /*
    * get_rel_namespace() returns InvalidOid if the relation no longer exists
    * (e.g., dropped concurrently). Skip such entries.
    */
    if (!OidIsValid(schemaid = get_rel_namespace(relid)))
    continue;
    
    thanks
    Shveta