Thread

  1. Re: relfilenode statistics

    Bertrand Drouvot <bertranddrouvot.pg@gmail.com> — 2026-05-18T16:28:26Z

    Hi,
    
    On Tue, Mar 31, 2026 at 10:45:50AM +0000, Bertrand Drouvot wrote:
    > Hi,
    > 
    > On Wed, Mar 25, 2026 at 03:25:07AM +0000, Bertrand Drouvot wrote:
    > > Hi,
    > > 
    > > On Wed, Mar 18, 2026 at 03:57:48AM +0000, Bertrand Drouvot wrote:
    > > > Hi,
    > > > 
    > > > PFA, new rebase due to fba4233c832.
    > > 
    > > Another rebase, due to 2102ebb1953 this time.
    > 
    > It's more than probably too late for v19 but it needs another rebase due to
    > d7965d65fc5b this time.
    
    PFA v16, a rebase due to 775fe51daae, 71ff232a5bc and c0b53ec0630.
    
    While at it, let's sum up the current state:
    
    Regarding Michael's question [1] about whether we should copy stats across
    rewrites: I still believe we should. Not doing so would produce user-visible
    regressions. The complexity is contained in patch 0002 and the approach is
    tested (including 2PC, subtransaction abort, and rewrite chains).
    
    Regarding Michael's suggestion [2] to split PgStat_StatTabEntry into three kinds
    (table/index/relfilenode) from the start: I think this patch is the right
    incremental step that doesn't preclude a future split. Here's my reasoning:
    
    As Andres pointed out [3], we'd want to populate more than just
    dead_tuples/ins_since_vacuum/mod_since_analyze during recovery. The right
    boundary for a split isn't clear yet until we actually implement WAL-replay-based
    stat population.
    
    I think that splitting now would be a much larger change with the risk of drawing
    the boundaries wrong.
    
    The current approach (key PGSTAT_KIND_RELATION by locator, keep the unified structure)
    is a contained change that unblocks future work. Once we have WAL replay populating
    stats, we'll have a much better understanding of what a split should look like,
    if one is still needed.
    
    I think we should do this incremental step first, then split later if/when the
    need becomes clearer.
    
    I believe we have consensus on the core approach ("use the relfilenumber instead
    of the relation OID, without changing the user experience"). The implementation
    addresses all the technical concerns raised so far (no new hash key field,
    PSEUDO_PARTITION_TABLE_SPCOID for partitioned tables, pgstat_fetch_stat_tabentry_by_locator()
    to avoid extra syscache lookups in do_autovacuum()).
    
    Andres, would you be willing to drive this toward commit once we've iterated
    on any remaining review feedback?
    
    Michael, I understand this isn't the design you'd prefer. Would you be open to
    reviewing the implementation nonetheless, or do you have a hard objection that
    would block this path?
    
    I'm happy to address any further concerns.
    
    [1]: https://postgr.es/m/aRGoGcOdutTHQfpn%40paquier.xyz
    [2]: https://postgr.es/m/aUELPdhdcyzTM_8K%40paquier.xyz
    [3]: https://postgr.es/m/zferux2jlbhqymubzhpubfrkjzhzxzguq4eprtycojtif5vbqh%402t7cu2teyqmi
    
    Regards,
    
    -- 
    Bertrand Drouvot
    PostgreSQL Contributors Team
    RDS Open Source Databases
    Amazon Web Services: https://aws.amazon.com