Thread

  1. Re: Implement waiting for wal lsn replay: reloaded

    Xuneng Zhou <xunengzhou@gmail.com> — 2025-12-22T07:56:59Z

    Hi,
    
    On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    >
    > Hi Alexander,
    >
    > Thanks for your feedback!
    >
    > > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
    > > mode parameter.  Should we allow this?
    >
    > I think this constraint could be relaxed if needed. I was previously
    > unsure about the use cases.
    
    Flush mode on the primary seems useful when synchronous_commit is set
    to off [1]. In that mode, a transaction in primary may return success
    before its WAL is durably flushed to disk, trading durability for
    lower latency. A “wait for primary flush” operation provides an
    explicit durability barrier for cases where applications or tools
    occasionally need stronger guarantees.
    
    [1] https://postgresqlco.nf/doc/en/param/synchronous_commit/
    
    > > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
    > > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
    > > principle, we could encode both as just 'flush' mode, and detect which
    > > WaitLSNType to pick by checking if recovery is in progress.  However,
    > > how should we then react to unreached flush location after standby
    > > promotion (technically it could be still reached but on the different
    > > timeline)?
    > >
    >
    > Technically, we can use 'flush' mode to specify WAIT FOR behavior in
    > both primary and replica. Currently, wait for commands error out if
    > promotion occurs since: either the requested LSN type does not exist
    > on the primary, or we do not yet have the infrastructure to support
    > continuing the wait. If we allow waiting for flush on the primary as a
    > user-visible command and the wake-up calls for flush in primary are
    > introduced, the question becomes whether we should still abort the
    > wait on promotion, or continue waiting—as you noted—given that the
    > target LSN might still be reached, albeit on a different timeline. The
    > question behind this might be: do users care and should be aware of
    > the state change of the server while waiting? If they do, then we
    > better stop the waiting and report the error. In this case, I am
    > inclined to to break the unified flush mode to something like
    > primary_flush/standby_flush mode and
    > WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
    >
    
    After further consideration, it also seems reasonable to use a single,
    unified flush mode that works on both primary and standby servers,
    provided its semantics are clearly documented to avoid the potential
    confusion on failure. I don’t have a strong preference between these
    two and would be interested in your thoughts.
    
    If a standby is promoted while a session is waiting, the command
    better abort and return an error (or report “not in recovery” when
    using NO_THROW). At that point, the meaning of the LSN being waited
    for may have changed due to the timeline switch and the transition
    from standby to primary. An LSN such as 0/5000000 on TLI 2 can
    represent entirely different WAL content from 0/5000000 on TLI 1.
    Allowing the wait to silently continue across promotion risks giving
    users a false sense of safety—for example, interpreting “wait
    completed” as “the original data is now durable,” which would no
    longer be true.
    
    --
    Best,
    Xuneng