Thread

  1. pg_stash_advice: dump file with overlong stash name crashes worker in a restart loop

    Ayush Tiwari <ayushtiwari.slg01@gmail.com> — 2026-05-17T06:24:52Z

    Hi,
    
    While testing pg_stash_advice, I came across a
    crash that's reachable through the persistence dump file:
    
    If $PGDATA/pg_stash_advice.tsv contains a stash name of 64 bytes or more
    (i.e. >= NAMEDATALEN), the worker hits an assertion in dshash_strhash()
    on startup. The postmaster treats the SIGABRT as a crash, performs full
    crash recovery, restarts the worker, and the worker hits the same
    assertion again. The cluster never reaches a stable state until the
    file is removed by hand.
    
    SQL-level callers cannot reach this because pgsa_check_stash_name()
    rejects long names. The persistence parser, however, accepts the name
    straight from the file and feeds it into dshash_find_or_insert(), which
    in turn calls dshash_strhash() configured with key_size = NAMEDATALEN.
    That function asserts strlen(v) < size. Strictly speaking the file
    parser was missing a length check that the SQL path already performs.
    
    Minimal reproduction
    ====================
    
        $ pg_ctl -D $PGDATA stop -m fast
        $ python3 -c "print('stash\t' + 'a'*64)" > $PGDATA/pg_stash_advice.tsv
        $ pg_ctl -D $PGDATA -l logfile start
    
    Server log (excerpt, every ~1-2 seconds):
    
        DEBUG:  pg_stash_advice worker started
        DEBUG:  restoring advice stash "aaaa...aaaa"
        TRAP: failed Assert("strlen((const char *) v) < size"),
              File: "dshash.c", Line: 634
        postgres: pg_stash_advice worker (ExceptionalCondition+0xbb)
        postgres: pg_stash_advice worker (dshash_strhash+0x48)
        postgres: pg_stash_advice worker (dshash_find_or_insert_extended+0x2e)
        pg_stash_advice.so (pgsa_create_stash)
        pg_stash_advice.so (pgsa_restore_stashes)
        pg_stash_advice.so (pgsa_read_from_disk)
        pg_stash_advice.so (pg_stash_advice_worker_main+0x1aa)
        ...
        LOG:  background worker "pg_stash_advice worker" was terminated by
              signal 6: Aborted
        LOG:  terminating any other active server processes
        LOG:  all server processes terminated; reinitializing
        [worker comes back up, asserts again, ad infinitum]
    
    In my testing this produced ~18 crashes in ~12 seconds before I removed
    the file. Threshold is exact: 63 bytes is fine, 64 bytes crashes.
    
    In a production (non-assertion) build the name would instead be silently
    truncated to NAMEDATALEN-1 bytes by the dshash key-copy path, allowing
    distinct long names that share a 63-byte prefix to collide in the stash
    table.
    
    The attached patch validates the parsed stash-name length in the same
    place where the parser already raises ERRCODE_DATA_CORRUPTED on other
    forms of syntax error, mirroring the existing length check in
    pgsa_check_stash_name(). Not sure if TAP test too is needed for this, but
    I've included one in the patch.
    
    Regards,
    Ayush