Thread

  1. Re: could sent_lsn be lower than write/flush/replay_lsn?

    Jaime Casanova <jcasanov@systemguards.com.ec> — 2025-12-31T01:42:16Z

    On Mon, Dec 29, 2025 at 2:13 AM Ashutosh Bapat
    <ashutosh.bapat.oss@gmail.com> wrote:
    >
    > On Sat, Dec 27, 2025 at 1:18 PM cca5507 <2624345507@qq.com> wrote:
    > >
    > > The sent_lsn is just where the wal sender currently reading, so it could be lower than
    > > write/flush/replay_lsn.
    >
    > +1.
    >
    > I guess, the logical replication is restarting in a loop. If that's
    > the case, you will find multiple errors happening in the loop. Another
    > guess is it's because of the walsender/receiver timeout. Do you see
    > timeout error from the corresponding background workers? What's
    > downstream?
    >
    
    Thanks both of you for clarifying this, it was actually a timeout
    error. It seems for some reason all the subscriber got disconnected
    from provider and for a problem we had some years ago (when using
    pglogical in this same customer) wal_sender_timeout was set to 1
    hour... which AFAIU made the wal_sender process keep active for 1 hour
    while the subscriber tried to reconnect ans saw a walsender already
    connected to another (the oldest already died) PID.
    
    We returned wal_sender_timeout to its original value and everything
    started to flow...
    
    
    -- 
    Jaime Casanova
    SYSTEMGUARDS S.A.