Re: Unnecessary delay in streaming replication due to replay lag

Anastasia Lubennikova <a.lubennikova@postgrespro.ru>

From: Anastasia Lubennikova <a.lubennikova@postgrespro.ru>
To: Michael Paquier <michael@paquier.xyz>, "lchch1990@sina.cn" <lchch1990@sina.cn>
Cc: Asim Praveen <pasim@vmware.com>, Masahiko Sawada <masahiko.sawada@2ndquadrant.com>, pgsql-hackers <pgsql-hackers@postgresql.org>, "Hao Wu (Pivotal)" <hawu@pivotal.io>, "ahsan.hadi" <ahsan.hadi@highgo.ca>
Date: 2020-12-01T14:21:51Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Generate GUC tables from .dat file

  2. Skip WAL recycling and preallocation during archive recovery.

  3. Fix scenario where streaming standby gets stuck at a continuation record.

On 20.11.2020 11:21, Michael Paquier wrote:
> On Tue, Sep 15, 2020 at 05:30:22PM +0800, lchch1990@sina.cn wrote:
>> I read the code and test the patch, it run well on my side, and I have several issues on the
>> patch.
> +                   RequestXLogStreaming(ThisTimeLineID,
> +                                        startpoint,
> +                                        PrimaryConnInfo,
> +                                        PrimarySlotName,
> +                                        wal_receiver_create_temp_slot);
>
> This patch thinks that it is fine to request streaming even if
> PrimaryConnInfo is not set, but that's not fine.
>
> Anyway, I don't quite understand what you are trying to achieve here.
> "startpoint" is used to request the beginning of streaming.  It is
> roughly the consistency LSN + some alpha with some checks on WAL
> pages (those WAL page checks are not acceptable as they make
> maintenance harder).  What about the case where consistency is
> reached but there are many segments still ahead that need to be
> replayed?  Your patch would cause streaming to begin too early, and
> a manual copy of segments is not a rare thing as in some environments
> a bulk copy of segments can make the catchup of a standby faster than
> streaming.
>
> It seems to me that what you are looking for here is some kind of
> pre-processing before entering the redo loop to determine the LSN
> that could be reused for the fast streaming start, which should match
> the end of the WAL present locally.  In short, you would need a
> XLogReaderState that begins a scan of WAL from the redo point until it
> cannot find anything more, and use the last LSN found as a base to
> begin requesting streaming.  The question of timeline jumps can also
> be very tricky, but it could also be possible to not allow this option
> if a timeline jump happens while attempting to guess the end of WAL
> ahead of time.  Another thing: could it be useful to have an extra
> mode to begin streaming without waiting for consistency to finish?
> --
> Michael


Status update for a commitfest entry.

This entry was "Waiting On Author" during this CF, so I've marked it as 
returned with feedback. Feel free to resubmit an updated version to a 
future commitfest.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company