Thread

  1. Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

    SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> — 2022-11-29T19:37:35Z

    On Tue, Nov 29, 2022 at 11:20 AM SATYANARAYANA NARLAPURAM <
    satyanarlapuram@gmail.com> wrote:
    
    >
    >
    > On Tue, Nov 29, 2022 at 10:52 AM Andrey Borodin <amborodin86@gmail.com>
    > wrote:
    >
    >> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <bruce@momjian.us> wrote:
    >> >
    >> > On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM
    >> wrote:
    >> > >     2. Process proc die immediately when a backend is waiting for sync
    >> > >     replication acknowledgement, as it does today, however, upon
    >> restart,
    >> > >     don't open up for business (don't accept ready-only connections)
    >> > >     unless the sync standbys have caught up.
    >> > >
    >> > >
    >> > > Are you planning to block connections or queries to the database? It
    >> would be
    >> > > good to allow connections and let them query the monitoring views but
    >> block the
    >> > > queries until sync standby have caught up. Otherwise, this leaves a
    >> monitoring
    >> > > hole. In cloud, I presume superusers are allowed to connect and
    >> monitor (end
    >> > > customers are not the role members and can't query the data). The
    >> same can't be
    >> > > true for all the installations. Could you please add more details on
    >> your
    >> > > approach?
    >> >
    >> > I think ALTER SYSTEM should be allowed, particularly so you can modify
    >> > synchronous_standby_names, no?
    >>
    >> We don't allow SQL access during crash recovery until it's caught up
    >> to consistency point. And that's for a reason - the cluster may have
    >> invalid system catalog.
    >> So no, after crash without a quorum of standbys you can only change
    >> auto.conf and send SIGHUP. Accessing the system catalog during crash
    >> recovery is another unrelated problem.
    >>
    >
    > In the crash recovery case, catalog is inconsistent but in this case, the
    > cluster has remote uncommitted changes (consistent). Accepting a superuser
    > connection is no harm. The auth checks performed are still valid after
    > standbys fully caught up. I don't see a reason why superuser / pg_monitor
    > connections are required to be blocked.
    >
    
    If blocking queries is harder, and superuser is not allowed to connect as
    it can read remote uncommitted data,  how about adding a new role that  can
    update and reload the server configuration?
    
    >
    >
    >> But I'd propose to treat these two points differently, they possess
    >> drastically different scales of danger. Query Cancels are issued here
    >> and there during failovers\switchovers. Crash amidst network
    >> partitioning is not that common.
    >>
    >
    > Supportability and operability are more important in corner cases to
    > quickly troubleshoot an issue,
    >
    >
    >>
    >> Best regards, Andrey Borodin.
    >>
    >