Thread

  1. PGConf.dev CSN unconference session: notes and follow-up discussion takeaways

    Matthias van de Meent <boekewurm+postgres@gmail.com> — 2026-05-27T08:56:57Z

    Hi,
    
    First, I want to inform you that I've added the notes I took in the CSN
    unconference session to the PGConf.dev 2026 unconference wiki page [0].
    They're in a rough shape as I was unable to both write notes and
    participate at the same time; so some parts of the conversation are
    missing.  I invite anyone with more notes (or better memory than mine)
    to add any missing parts.
    
    
    Second, I'd like to share a few takeaway items from the CSN session and
    subsequent hallway track discussions, as possible start for further
    discussions:
    
    
    1. The primary source of complications in CSN (and snapshotting in
    general) is generally agreed upon to be *visibility semantics* vs
    *durability semantics*, primarily seen through the synchronous_commit
    setting.
    
    
    2. Visibility of s_c=off commits ("async commits") is immediate, but
    some users with with s_c=on ("sync commits") or s_c=remote_{read|write}
    ("remote commits") may not want to see not-yet-durable async commits.
    
    2a. It was suggested to allow sync-commit's sessions to wait for such
    async commits' commit LSN to become sufficiently durable if they need
    to read those async commits' data.
    
    2b. It was also suggested to make async commits wait for durability
    of sync commits' CSN [^1].  A counterpoint to this would be that it'd be
    a heavy penalty for async commits that need to read data that has
    recently been modified by non-async commits.
    
    
    3. There was no clearly articulated consensus that it is necessary for
    the CSN work to fix our Long Fork [2] issue [3] (different visibility
    order between primary and replica).  See also point 5 and 6.
    
    
    4. The primary consensus from the session seems to be that commit-record
    LSN would work as a natural CSN on replicas; and that it won't change
    current replica visibility semantics.
    
    
    5. Not everyone agreed that the LSN of commit records as such is
    sufficient as CSN for primaries:
    
    5a. Visibility order of sync commits vs async commits is the primary
    issue here; a session with only async commits is able to handle any
    amount of transactions whilst another session with s_c=remote_apply
    ("remote commit") may take forever to get confirmed and become visible.
    
    5b. A suggested solution to visibility ordering issues was to log a
    'commit visible' record for transactions whose COMMIT record has reached
    its durability requirement, and use that record as CSN.  This record
    could be shared by multiple commits, in a way that's similar to how
    commit_delay/commit_siblings combine WAL fsyncs, to limit the net new
    WAL generation per commit, and would be optional (or, implied) when the
    primary is lost before the visibility record is logged.  This new
    'commit visible' record would be comparable to 2PC, with as main
    differentiator that it would not allow rollbacks, and that every
    committed not-yet-visible transaction would automatically become visible
    once recovery ends/when a standby promotes.
    
    
    6. It was noted that it is not even strictly necessary to use LSNs as
    CSN on the primary:
    
    6a. A local in-memory counter could be used to generate the (unlogged)
    CSNs only when visibility is achieved.  This would allow us to implement
    visibility semantics on the primary that behave equivalent to its
    current behaviour.  Whilst this wouldn't solve the Long Fork issue, it
    would enable the benefits of CSN snapshots on the primary.
    
    6b. It was mentioned that this approach could take more effort than
    just using LSN-based CSNs.
    
    
    I hope this has been informative and can help move discussions about
    this feature forward.
    
    
    Kind regards,
    
    Matthias van de Meent
    Databricks (https://www.databricks.com)
    
    
    [0]: https://wiki.postgresql.org/wiki/PGConf.dev_2026_Developer_Unconference#Commit_Sequence_Numbers
    [^1]: In a world where the current WAL insert pointer is used to
    construct a snapshot, and every commit only needs to log a single record
    to become visible.
    [2]: https://jepsen.io/consistency/phenomena/long-fork
    [3]: https://jepsen.io/analyses/amazon-rds-for-postgresql-17.4