Re: Implement waiting for wal lsn replay: reloaded

Xuneng Zhou <xunengzhou@gmail.com>

From: Xuneng Zhou <xunengzhou@gmail.com>

To: Alexander Korotkov <aekorotkov@gmail.com>

Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>, Andres Freund <andres@anarazel.de>, Álvaro Herrera <alvherre@kurilemu.de>, Michael Paquier <michael@paquier.xyz>, jian he <jian.universality@gmail.com>, Tomas Vondra <tomas@vondra.me>, Yura Sokolov <y.sokolov@postgrespro.ru>

Date: 2025-12-22T07:56:59Z

Lists: pgsql-hackers

Attachments

synchronous_commit.png (image/png)

Hi,

On Sun, Dec 21, 2025 at 12:37 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi Alexander,
>
> Thanks for your feedback!
>
> > I see that we can't specify WAIT_LSN_TYPE_PRIMARY_FLUSH by setting
> > mode parameter.  Should we allow this?
>
> I think this constraint could be relaxed if needed. I was previously
> unsure about the use cases.

Flush mode on the primary seems useful when synchronous_commit is set
to off [1]. In that mode, a transaction in primary may return success
before its WAL is durably flushed to disk, trading durability for
lower latency. A “wait for primary flush” operation provides an
explicit durability barrier for cases where applications or tools
occasionally need stronger guarantees.

[1] https://postgresqlco.nf/doc/en/param/synchronous_commit/

> > If we allow specifying WAIT_LSN_TYPE_PRIMARY_FLUSH, should it be
> > separate mode value or the same with WAIT_LSN_TYPE_STANDBY_FLUSH?  In
> > principle, we could encode both as just 'flush' mode, and detect which
> > WaitLSNType to pick by checking if recovery is in progress.  However,
> > how should we then react to unreached flush location after standby
> > promotion (technically it could be still reached but on the different
> > timeline)?
> >
>
> Technically, we can use 'flush' mode to specify WAIT FOR behavior in
> both primary and replica. Currently, wait for commands error out if
> promotion occurs since: either the requested LSN type does not exist
> on the primary, or we do not yet have the infrastructure to support
> continuing the wait. If we allow waiting for flush on the primary as a
> user-visible command and the wake-up calls for flush in primary are
> introduced, the question becomes whether we should still abort the
> wait on promotion, or continue waiting—as you noted—given that the
> target LSN might still be reached, albeit on a different timeline. The
> question behind this might be: do users care and should be aware of
> the state change of the server while waiting? If they do, then we
> better stop the waiting and report the error. In this case, I am
> inclined to to break the unified flush mode to something like
> primary_flush/standby_flush mode and
> WAIT_LSN_TYPE_PRIMARY_FLUSH/WAIT_LSN_TYPE_STANDBY_FLUSH respectively.
>

After further consideration, it also seems reasonable to use a single,
unified flush mode that works on both primary and standby servers,
provided its semantics are clearly documented to avoid the potential
confusion on failure. I don’t have a strong preference between these
two and would be interested in your thoughts.

If a standby is promoted while a session is waiting, the command
better abort and return an error (or report “not in recovery” when
using NO_THROW). At that point, the meaning of the LSN being waited
for may have changed due to the timeline switch and the transition
from standby to primary. An LSN such as 0/5000000 on TLI 2 can
represent entirely different WAL content from 0/5000000 on TLI 1.
Allowing the wait to silently continue across promotion risks giving
users a false sense of safety—for example, interpreting “wait
completed” as “the original data is now durable,” which would no
longer be true.

--
Best,
Xuneng