Thread

  1. RE: Assertion failure in SnapBuildInitialSnapshot()

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2025-11-26T04:25:55Z

    On Wednesday, November 26, 2025 2:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
    > 
    > On Tue, Nov 25, 2025 at 4:02 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
    > wrote:
    > >
    > > On Tuesday, November 25, 2025 3:30 AM Masahiko Sawada
    > <sawada.mshk@gmail.com> wrote:
    > > >
    > > >
    > > > Given that the computation of xmin and catalog_xmin among all slots
    > > > could be executed concurrently, could the following scenario happen
    > > > where
    > > > procArray->replication_slot_xmin and replication_slot_catalog_xmin
    > > > procArray->are retreat to a non-invalid
    > > > XID?
    > > >
    > > > 1. Suppose the initial value procArray->replication_slot_catalog_xmin is
    > 50.
    > > > 2. Process-A updates its owned slot's catalog_xmin to 100, and
    > > > computes the new catalog_xmin as 100 while holding
    > > > ReplicationSlotControlLock in a shared mode in
    > > > ReplicationSlotsComputeRequiredLSN(). But it doesn't update the
    > procArray's catalog_xmin value yet.
    > > > 3. Process-B updates its owned slot's catalog_xmin to 150, and
    > > > computes the new catalog_xmin as 150.
    > > > 4. Process-B updates the procArray->replication_slot_catalog_xmin to
    > 150.
    > > > 5. Process-A updates the procArray->repilcation_slot_catalog_xmin to
    > > > 100, which was 150.
    > >
    > > After further investigation, I think that steps 3 and 4 cannot occur
    > > because Process-B must have already encountered the catalog_xmin
    > > maintained by Process-A, either 50 or 100. Consequently, Process-B
    > > will refrain from updating the catalog_xmin to a more recent value, such as
    > 150.
    > 
    > Right. But the following scenario seems to happen:
    > 
    > 1. Both processes have a slot with effective_catalog_xmin = 100.
    > 2. Process-A updates effective_catalog_xmin to 150, and computes the new
    > catalog_xmin as 100 because process-B slot still has effective_catalog_xmin =
    > 100.
    > 3. Process-B updates effective_catalog_xmin to 150, and computes the new
    > catalog_xmin as 150.
    > 4. Process-B updates procArray->replication_slot_catalog_xmin to 150.
    > 5. Process-A updates procArray->replication_slot_catalog_xmin to 100.
    
    I think this scenario can occur, but is not harmful. Because the catalog rows
    removed prior to xid:150 would no longer be used, as both slots have advanced
    their catalog_xmin and flushed the value to disk. Therefore, even if
    replication_slot_catalog_xmin regresses, it should be OK.
    
    Considering all above, I think allowing concurrent xmin computation, as the
    patch does, is acceptable. What do you think ?
    
    Best Regards,
    Hou zj