Thread

  1. Re: Implement waiting for wal lsn replay: reloaded

    Xuneng Zhou <xunengzhou@gmail.com> — 2025-12-21T04:37:18Z

    Hi Alexander,
    
    Thanks for your feedback!
    
    > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
    > mode parameter.  Should we allow this?
    
    I think this constraint could be relaxed if needed. I was previously
    unsure about the use cases.
    
    > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
    > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
    > principle, we could encode both as just 'flush' mode, and detect which
    > WaitLSNType to pick by checking if recovery is in progress.  However,
    > how should we then react to unreached flush location after standby
    > promotion (technically it could be still reached but on the different
    > timeline)?
    >
    
    Technically, we can use 'flush' mode to specify WAIT FOR behavior in
    both primary and replica. Currently, wait for commands error out if
    promotion occurs since: either the requested LSN type does not exist
    on the primary, or we do not yet have the infrastructure to support
    continuing the wait. If we allow waiting for flush on the primary as a
    user-visible command and the wake-up calls for flush in primary are
    introduced, the question becomes whether we should still abort the
    wait on promotion, or continue waiting—as you noted—given that the
    target LSN might still be reached, albeit on a different timeline. The
    question behind this might be: do users care and should be aware of
    the state change of the server while waiting? If they do, then we
    better stop the waiting and report the error. In this case, I am
    inclined to to break the unified flush mode to something like
    primary_flush/standby_flush mode and
    WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
    
    --
    Best,
    Xuneng