Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Don't advance origin during apply failure.

  2. Perform apply of large transactions by parallel workers.

  1. [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> — 2024-08-08T05:07:18Z

    Dear hackers,
    
    This thread forks from [1]. Here can be used to discuss second item.
    Below part contains the same statements written in [1], but I did copy-and-paste
    just in case. Attached patch is almost the same but bit modified based on the comment
    from Amit [2] - an unrelated change is removed.
    
    Found issue
    =====
    When the subscriber enables two-phase commit but doesn't set max_prepared_transaction >0
    and a transaction is prepared on the publisher, the apply worker reports an ERROR
    on the subscriber. After that, the prepared transaction is not replayed, which
    means it's lost forever. Attached script can emulate the situation.
    
    --
    ERROR:  prepared transactions are disabled
    HINT:  Set "max_prepared_transactions" to a nonzero value.
    --
    
    The reason is that we advanced the origin progress when aborting the
    transaction as well (RecordTransactionAbort->replorigin_session_advance). So,
    after setting replorigin_session_origin_lsn, if any ERROR happens when preparing
    the transaction, the transaction aborts which incorrectly advances the origin lsn.
    
    An easiest fix is to reset session replication origin before calling the
    RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop()
    raises an ERROR or 2) apply worker exits. Attached patch can fix the issue on HEAD.
    
    [1]: https://www.postgresql.org/message-id/TYAPR01MB5692FA4926754B91E9D7B5F0F5AA2%40TYAPR01MB5692.jpnprd01.prod.outlook.com
    [2]: https://www.postgresql.org/message-id/CAA4eK1L-r8OKGdBwC6AeXSibrjr9xKsg8LjGpX_PDR5Go-A9TA%40mail.gmail.com
    
    Best regards,
    Hayato Kuroda
    FUJITSU LIMITED
    
    
  2. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2024-08-08T06:33:30Z

    On Thu, Aug 8, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
    <kuroda.hayato@fujitsu.com> wrote:
    >
    ...
    >
    > An easiest fix is to reset session replication origin before calling the
    > RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop()
    > raises an ERROR or 2) apply worker exits. Attached patch can fix the issue on HEAD.
    >
    
    Few comments:
    =============
    *
    @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
      }
      PG_CATCH();
      {
    + /*
    + * Reset the origin data to prevent the advancement of origin progress
    + * if the transaction failed to apply.
    + */
    + replorigin_session_origin = InvalidRepOriginId;
    + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    + replorigin_session_origin_timestamp = 0;
    
    Can't we call replorigin_reset() instead here?
    
    *
    + /*
    + * Register a callback to reset the origin state before aborting the
    + * transaction in ShutdownPostgres(). This is to prevent the advancement
    + * of origin progress if the transaction failed to apply.
    + */
    + before_shmem_exit(replorigin_reset, (Datum) 0);
    
    I think we need this despite resetting the origin-related variables in
    PG_CATCH block to handle FATAL error cases, right? If so, can we use
    PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    
    
    -- 
    With Regards,
    Amit Kapila.
    
    
    
    
  3. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    shveta malik <shveta.malik@gmail.com> — 2024-08-08T10:01:19Z

    On Thu, Aug 8, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Thu, Aug 8, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
    > <kuroda.hayato@fujitsu.com> wrote:
    > >
    > ...
    > >
    > > An easiest fix is to reset session replication origin before calling the
    > > RecordTransactionAbort(). I think this can happen when 1) LogicalRepApplyLoop()
    > > raises an ERROR or 2) apply worker exits. Attached patch can fix the issue on HEAD.
    > >
    >
    > Few comments:
    > =============
    > *
    > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
    >   }
    >   PG_CATCH();
    >   {
    > + /*
    > + * Reset the origin data to prevent the advancement of origin progress
    > + * if the transaction failed to apply.
    > + */
    > + replorigin_session_origin = InvalidRepOriginId;
    > + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    > + replorigin_session_origin_timestamp = 0;
    >
    > Can't we call replorigin_reset() instead here?
    >
    > *
    > + /*
    > + * Register a callback to reset the origin state before aborting the
    > + * transaction in ShutdownPostgres(). This is to prevent the advancement
    > + * of origin progress if the transaction failed to apply.
    > + */
    > + before_shmem_exit(replorigin_reset, (Datum) 0);
    >
    > I think we need this despite resetting the origin-related variables in
    > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    
    +1
    
    Basic tests work fine on this patch. Just thinking out loud,
    SetupApplyOrSyncWorker() is called for table-sync worker as well and
    IIUC tablesync worker does not deal with 2PC txns. So do we even need
    to register replorigin_reset() for tablesync worker as well? If we may
    hit such an issue in general, then perhaps we need it in table-sync
    worker  otherwise not.  It needs some investigation. Thoughts?
    
    thanks
    Shveta
    
    
    
    
  4. RE: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2024-08-08T10:11:11Z

    On Thursday, August 8, 2024 6:01 PM shveta malik <shveta.malik@gmail.com> wrote:
    > 
    > On Thu, Aug 8, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com>
    > wrote:
    > >
    > > On Thu, Aug 8, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
    > > <kuroda.hayato@fujitsu.com> wrote:
    > > >
    > > ...
    > > >
    > > > An easiest fix is to reset session replication origin before calling
    > > > the RecordTransactionAbort(). I think this can happen when 1)
    > > > LogicalRepApplyLoop() raises an ERROR or 2) apply worker exits.
    > Attached patch can fix the issue on HEAD.
    > > >
    > >
    > > Few comments:
    > > =============
    > > *
    > > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
    > >   }
    > >   PG_CATCH();
    > >   {
    > > + /*
    > > + * Reset the origin data to prevent the advancement of origin
    > > + progress
    > > + * if the transaction failed to apply.
    > > + */
    > > + replorigin_session_origin = InvalidRepOriginId;
    > > + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    > > + replorigin_session_origin_timestamp = 0;
    > >
    > > Can't we call replorigin_reset() instead here?
    > >
    > > *
    > > + /*
    > > + * Register a callback to reset the origin state before aborting the
    > > + * transaction in ShutdownPostgres(). This is to prevent the
    > > + advancement
    > > + * of origin progress if the transaction failed to apply.
    > > + */
    > > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > >
    > > I think we need this despite resetting the origin-related variables in
    > > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    > 
    > +1
    > 
    > Basic tests work fine on this patch. Just thinking out loud,
    > SetupApplyOrSyncWorker() is called for table-sync worker as well and IIUC
    > tablesync worker does not deal with 2PC txns. So do we even need to register
    > replorigin_reset() for tablesync worker as well? If we may hit such an issue in
    > general, then perhaps we need it in table-sync worker  otherwise not.  It
    > needs some investigation. Thoughts?
    
    I think this is a general issue that can occur not only due to 2PC. IIUC, this
    problem should arise if any ERROR arises after setting the
    replorigin_session_origin_lsn but before the CommitTransactionCommand is
    completed. If so, I think we should register it for tablesync worker as well.
    
    Best Regards,
    Hou zj
    
  5. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2024-08-08T12:38:11Z

    On Thu, Aug 8, 2024 at 3:41 PM Zhijie Hou (Fujitsu)
    <houzj.fnst@fujitsu.com> wrote:
    >
    > On Thursday, August 8, 2024 6:01 PM shveta malik <shveta.malik@gmail.com> wrote:
    > >
    > > On Thu, Aug 8, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com>
    > > wrote:
    > > >
    > > > On Thu, Aug 8, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
    > > > <kuroda.hayato@fujitsu.com> wrote:
    > > > >
    > > > ...
    > > > >
    > > > > An easiest fix is to reset session replication origin before calling
    > > > > the RecordTransactionAbort(). I think this can happen when 1)
    > > > > LogicalRepApplyLoop() raises an ERROR or 2) apply worker exits.
    > > Attached patch can fix the issue on HEAD.
    > > > >
    > > >
    > > > Few comments:
    > > > =============
    > > > *
    > > > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
    > > >   }
    > > >   PG_CATCH();
    > > >   {
    > > > + /*
    > > > + * Reset the origin data to prevent the advancement of origin
    > > > + progress
    > > > + * if the transaction failed to apply.
    > > > + */
    > > > + replorigin_session_origin = InvalidRepOriginId;
    > > > + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    > > > + replorigin_session_origin_timestamp = 0;
    > > >
    > > > Can't we call replorigin_reset() instead here?
    > > >
    > > > *
    > > > + /*
    > > > + * Register a callback to reset the origin state before aborting the
    > > > + * transaction in ShutdownPostgres(). This is to prevent the
    > > > + advancement
    > > > + * of origin progress if the transaction failed to apply.
    > > > + */
    > > > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > > >
    > > > I think we need this despite resetting the origin-related variables in
    > > > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > > > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    > >
    > > +1
    > >
    > > Basic tests work fine on this patch. Just thinking out loud,
    > > SetupApplyOrSyncWorker() is called for table-sync worker as well and IIUC
    > > tablesync worker does not deal with 2PC txns. So do we even need to register
    > > replorigin_reset() for tablesync worker as well? If we may hit such an issue in
    > > general, then perhaps we need it in table-sync worker  otherwise not.  It
    > > needs some investigation. Thoughts?
    >
    > I think this is a general issue that can occur not only due to 2PC. IIUC, this
    > problem should arise if any ERROR arises after setting the
    > replorigin_session_origin_lsn but before the CommitTransactionCommand is
    > completed. If so, I think we should register it for tablesync worker as well.
    >
    
    As pointed out earlier, won't using PG_ENSURE_ERROR_CLEANUP() instead
    of PG_CATCH() be enough?
    
    -- 
    With Regards,
    Amit Kapila.
    
    
    
    
  6. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    shveta malik <shveta.malik@gmail.com> — 2024-08-09T04:04:54Z

    On Thu, Aug 8, 2024 at 6:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Thu, Aug 8, 2024 at 3:41 PM Zhijie Hou (Fujitsu)
    > <houzj.fnst@fujitsu.com> wrote:
    > >
    > > On Thursday, August 8, 2024 6:01 PM shveta malik <shveta.malik@gmail.com> wrote:
    > > >
    > > > On Thu, Aug 8, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com>
    > > > wrote:
    > > > >
    > > > > On Thu, Aug 8, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
    > > > > <kuroda.hayato@fujitsu.com> wrote:
    > > > > >
    > > > > ...
    > > > > >
    > > > > > An easiest fix is to reset session replication origin before calling
    > > > > > the RecordTransactionAbort(). I think this can happen when 1)
    > > > > > LogicalRepApplyLoop() raises an ERROR or 2) apply worker exits.
    > > > Attached patch can fix the issue on HEAD.
    > > > > >
    > > > >
    > > > > Few comments:
    > > > > =============
    > > > > *
    > > > > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
    > > > >   }
    > > > >   PG_CATCH();
    > > > >   {
    > > > > + /*
    > > > > + * Reset the origin data to prevent the advancement of origin
    > > > > + progress
    > > > > + * if the transaction failed to apply.
    > > > > + */
    > > > > + replorigin_session_origin = InvalidRepOriginId;
    > > > > + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    > > > > + replorigin_session_origin_timestamp = 0;
    > > > >
    > > > > Can't we call replorigin_reset() instead here?
    > > > >
    > > > > *
    > > > > + /*
    > > > > + * Register a callback to reset the origin state before aborting the
    > > > > + * transaction in ShutdownPostgres(). This is to prevent the
    > > > > + advancement
    > > > > + * of origin progress if the transaction failed to apply.
    > > > > + */
    > > > > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > > > >
    > > > > I think we need this despite resetting the origin-related variables in
    > > > > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > > > > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    > > >
    > > > +1
    > > >
    > > > Basic tests work fine on this patch. Just thinking out loud,
    > > > SetupApplyOrSyncWorker() is called for table-sync worker as well and IIUC
    > > > tablesync worker does not deal with 2PC txns. So do we even need to register
    > > > replorigin_reset() for tablesync worker as well? If we may hit such an issue in
    > > > general, then perhaps we need it in table-sync worker  otherwise not.  It
    > > > needs some investigation. Thoughts?
    > >
    > > I think this is a general issue that can occur not only due to 2PC. IIUC, this
    > > problem should arise if any ERROR arises after setting the
    > > replorigin_session_origin_lsn but before the CommitTransactionCommand is
    > > completed. If so, I think we should register it for tablesync worker as well.
    > >
    >
    > As pointed out earlier, won't using PG_ENSURE_ERROR_CLEANUP() instead
    > of PG_CATCH() be enough?
    
    Yes, I think it should suffice. IIUC, we are going to change
    'replorigin_session_origin_lsn' only in start_apply() and not before
    that, and thus ensuring its reset during any ERROR or FATAL in
    start_apply() is good enough. And I guess we don't want this
    origin-reset to be called during regular shutdown, isn't it? But
    registering it through before_shmem_exit() will make the
    reset-function to be called during normal shutdown as well.
    And to answer my previous question (as Hou-San also  pointed out), we
    do need it in table-sync worker as well. So a change in start_apply
    will make sure the fix is valid for both apply and tablesync worker.
    
    thanks
    Shveta
    
    
    
    
  7. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2024-08-09T05:49:12Z

    On Fri, Aug 9, 2024 at 9:35 AM shveta malik <shveta.malik@gmail.com> wrote:
    >
    > On Thu, Aug 8, 2024 at 6:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > >
    > > >
    > > > I think this is a general issue that can occur not only due to 2PC. IIUC, this
    > > > problem should arise if any ERROR arises after setting the
    > > > replorigin_session_origin_lsn but before the CommitTransactionCommand is
    > > > completed. If so, I think we should register it for tablesync worker as well.
    > > >
    > >
    > > As pointed out earlier, won't using PG_ENSURE_ERROR_CLEANUP() instead
    > > of PG_CATCH() be enough?
    >
    > Yes, I think it should suffice. IIUC, we are going to change
    > 'replorigin_session_origin_lsn' only in start_apply() and not before
    > that, and thus ensuring its reset during any ERROR or FATAL in
    > start_apply() is good enough.
    >
    
    Right, I also think so.
    
    > And I guess we don't want this
    > origin-reset to be called during regular shutdown, isn't it?
    >
    
    Agreed. OTOH, there was no harm even if such a reset function is invoked.
    
    > But
    > registering it through before_shmem_exit() will make the
    > reset-function to be called during normal shutdown as well.
    >
    
    True and unless I am missing something we won't need it. I would like
    to hear the opinions of Hou-San and Kuroda-San on the same.
    
    > And to answer my previous question (as Hou-San also  pointed out), we
    > do need it in table-sync worker as well. So a change in start_apply
    > will make sure the fix is valid for both apply and tablesync worker.
    >
    
    As table-sync workers can also apply transactions after the initial
    copy, we need it for table-sync during its apply phase.
    
    -- 
    With Regards,
    Amit Kapila.
    
    
    
    
  8. RE: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> — 2024-08-09T09:09:34Z

    Dear Amit, Shveta, Hou,
    
    Thanks for giving many comments! I've updated the patch.
    
    > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
    >   }
    >   PG_CATCH();
    >   {
    > + /*
    > + * Reset the origin data to prevent the advancement of origin progress
    > + * if the transaction failed to apply.
    > + */
    > + replorigin_session_origin = InvalidRepOriginId;
    > + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    > + replorigin_session_origin_timestamp = 0;
    > 
    > Can't we call replorigin_reset() instead here?
    
    I didn't use the function because arguments of calling function looked strange,
    but ideally I can. Fixed.
    
    > + /*
    > + * Register a callback to reset the origin state before aborting the
    > + * transaction in ShutdownPostgres(). This is to prevent the advancement
    > + * of origin progress if the transaction failed to apply.
    > + */
    > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > 
    > I think we need this despite resetting the origin-related variables in
    > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    
    There are two reasons to add a shmem-exit callback. One is to support a FATAL,
    another one is to support the case that user does the shutdown request while
    applying changes. In this case, I think ShutdownPostgres() can be called so that
    the session origin may advance.
    
    However, I think we cannot use PG_ENSURE_ERROR_CLEANUP()/PG_END_ENSURE_ERROR_CLEANUP
    macros here. According to codes, it assumes that any before-shmem callbacks are
    not registered within the block because the cleanup function is registered and canceled
    within the macro. LogicalRepApplyLoop() can register the function when
    it handles COMMIT PREPARED message so it breaks the rule.
    
    Best regards,
    Hayato Kuroda
    FUJITSU LIMITED
    
    
  9. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    shveta malik <shveta.malik@gmail.com> — 2024-08-12T10:06:52Z

    On Fri, Aug 9, 2024 at 2:39 PM Hayato Kuroda (Fujitsu)
    <kuroda.hayato@fujitsu.com> wrote:
    >
    > Dear Amit, Shveta, Hou,
    >
    > Thanks for giving many comments! I've updated the patch.
    >
    > > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
    > >   }
    > >   PG_CATCH();
    > >   {
    > > + /*
    > > + * Reset the origin data to prevent the advancement of origin progress
    > > + * if the transaction failed to apply.
    > > + */
    > > + replorigin_session_origin = InvalidRepOriginId;
    > > + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    > > + replorigin_session_origin_timestamp = 0;
    > >
    > > Can't we call replorigin_reset() instead here?
    >
    > I didn't use the function because arguments of calling function looked strange,
    > but ideally I can. Fixed.
    >
    > > + /*
    > > + * Register a callback to reset the origin state before aborting the
    > > + * transaction in ShutdownPostgres(). This is to prevent the advancement
    > > + * of origin progress if the transaction failed to apply.
    > > + */
    > > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > >
    > > I think we need this despite resetting the origin-related variables in
    > > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    >
    > There are two reasons to add a shmem-exit callback. One is to support a FATAL,
    > another one is to support the case that user does the shutdown request while
    > applying changes. In this case, I think ShutdownPostgres() can be called so that
    > the session origin may advance.
    
    Agree that we need the 'reset' during shutdown flow as well. Details at [1]
    
    > However, I think we cannot use PG_ENSURE_ERROR_CLEANUP()/PG_END_ENSURE_ERROR_CLEANUP
    > macros here. According to codes, it assumes that any before-shmem callbacks are
    > not registered within the block because the cleanup function is registered and canceled
    > within the macro. LogicalRepApplyLoop() can register the function when
    > it handles COMMIT PREPARED message so it breaks the rule.
    
    Yes, on reanalyzing, we can not use PG_ENSURE_ERROR_CLEANUP in this
    flow due to the limitation of cancel_before_shmem_exit() that it can
    cancel only the last registered callback, while in our flow we have
    other callbacks also registered after we register our reset one.
    
    [1]
    Shutdown analysis:
    
    I did a test where we make apply worker wait for say 0sec right after
    it updates 'replorigin_session_origin_lsn' in
    apply_handle_prepare_internal() (say it at code-point1). During this
    wait, we triggered a subscriber shutdown.Under normal circumstances,
    everything works fine: after the wait, the apply worker processes the
    SIGTERM (via LogicalRepApplyLoop-->ProcessInterrupts()) only after the
    prepare phase is complete, meaning the PREPARE LSN is flushed, and the
    origin LSN is correctly advanced in EndPrepare() before the worker
    shuts down. But, if we insert a LOG statement between code-point1 and
    EndPrepare(), the apply worker processes the SIGTERM during the LOG
    operation, as errfinish() triggers CHECK_FOR_INTERRUPTS at the end,
    which causes the origin LSN to be incorrectly advanced during
    shutdown. And thus the subsequent COMMIT PREPARED on the publisher
    results in ERROR on subscriber; as the 'PREPARE' is lost on the
    subscriber and is not resent by the publisher. ERROR:  prepared
    transaction with identifier "pg_gid_16403_757" does not exist
    
    A similar problem can also occur without introducing any additional
    LOG statements, but by simply setting log_min_messages=debug5. This
    causes the apply worker to output a few DEBUG messages upon receiving
    a shutdown signal (after code-point1) before it reaches EndPrepare().
    As a result, it ends up processing the SIGTERM (during logging)and
    invoking AbortOutOfAnyTransaction(), which incorrectly advances the
    origin LSN.
    
    thanks
    Shveta
    
    
    
    
  10. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    shveta malik <shveta.malik@gmail.com> — 2024-08-12T10:10:03Z

    On Mon, Aug 12, 2024 at 3:36 PM shveta malik <shveta.malik@gmail.com> wrote:
    >
    > On Fri, Aug 9, 2024 at 2:39 PM Hayato Kuroda (Fujitsu)
    > <kuroda.hayato@fujitsu.com> wrote:
    > >
    > > Dear Amit, Shveta, Hou,
    > >
    > > Thanks for giving many comments! I've updated the patch.
    > >
    > > > @@ -4409,6 +4409,14 @@ start_apply(XLogRecPtr origin_startpos)
    > > >   }
    > > >   PG_CATCH();
    > > >   {
    > > > + /*
    > > > + * Reset the origin data to prevent the advancement of origin progress
    > > > + * if the transaction failed to apply.
    > > > + */
    > > > + replorigin_session_origin = InvalidRepOriginId;
    > > > + replorigin_session_origin_lsn = InvalidXLogRecPtr;
    > > > + replorigin_session_origin_timestamp = 0;
    > > >
    > > > Can't we call replorigin_reset() instead here?
    > >
    > > I didn't use the function because arguments of calling function looked strange,
    > > but ideally I can. Fixed.
    > >
    > > > + /*
    > > > + * Register a callback to reset the origin state before aborting the
    > > > + * transaction in ShutdownPostgres(). This is to prevent the advancement
    > > > + * of origin progress if the transaction failed to apply.
    > > > + */
    > > > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > > >
    > > > I think we need this despite resetting the origin-related variables in
    > > > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > > > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    > >
    > > There are two reasons to add a shmem-exit callback. One is to support a FATAL,
    > > another one is to support the case that user does the shutdown request while
    > > applying changes. In this case, I think ShutdownPostgres() can be called so that
    > > the session origin may advance.
    >
    > Agree that we need the 'reset' during shutdown flow as well. Details at [1]
    >
    > > However, I think we cannot use PG_ENSURE_ERROR_CLEANUP()/PG_END_ENSURE_ERROR_CLEANUP
    > > macros here. According to codes, it assumes that any before-shmem callbacks are
    > > not registered within the block because the cleanup function is registered and canceled
    > > within the macro. LogicalRepApplyLoop() can register the function when
    > > it handles COMMIT PREPARED message so it breaks the rule.
    >
    > Yes, on reanalyzing, we can not use PG_ENSURE_ERROR_CLEANUP in this
    > flow due to the limitation of cancel_before_shmem_exit() that it can
    > cancel only the last registered callback, while in our flow we have
    > other callbacks also registered after we register our reset one.
    >
    > [1]
    > Shutdown analysis:
    >
    > I did a test where we make apply worker wait for say 0sec right after
    
    Correction here: 0sec -->10sec
    
    > it updates 'replorigin_session_origin_lsn' in
    > apply_handle_prepare_internal() (say it at code-point1). During this
    > wait, we triggered a subscriber shutdown.Under normal circumstances,
    > everything works fine: after the wait, the apply worker processes the
    > SIGTERM (via LogicalRepApplyLoop-->ProcessInterrupts()) only after the
    > prepare phase is complete, meaning the PREPARE LSN is flushed, and the
    > origin LSN is correctly advanced in EndPrepare() before the worker
    > shuts down. But, if we insert a LOG statement between code-point1 and
    > EndPrepare(), the apply worker processes the SIGTERM during the LOG
    > operation, as errfinish() triggers CHECK_FOR_INTERRUPTS at the end,
    > which causes the origin LSN to be incorrectly advanced during
    > shutdown. And thus the subsequent COMMIT PREPARED on the publisher
    > results in ERROR on subscriber; as the 'PREPARE' is lost on the
    > subscriber and is not resent by the publisher. ERROR:  prepared
    > transaction with identifier "pg_gid_16403_757" does not exist
    >
    > A similar problem can also occur without introducing any additional
    > LOG statements, but by simply setting log_min_messages=debug5. This
    > causes the apply worker to output a few DEBUG messages upon receiving
    > a shutdown signal (after code-point1) before it reaches EndPrepare().
    > As a result, it ends up processing the SIGTERM (during logging)and
    > invoking AbortOutOfAnyTransaction(), which incorrectly advances the
    > origin LSN.
    >
    > thanks
    > Shveta
    
    
    
    
  11. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2024-08-13T04:18:37Z

    On Mon, Aug 12, 2024 at 3:37 PM shveta malik <shveta.malik@gmail.com> wrote:
    >
    > On Fri, Aug 9, 2024 at 2:39 PM Hayato Kuroda (Fujitsu)
    > <kuroda.hayato@fujitsu.com> wrote:
    > >
    > >
    > > > + /*
    > > > + * Register a callback to reset the origin state before aborting the
    > > > + * transaction in ShutdownPostgres(). This is to prevent the advancement
    > > > + * of origin progress if the transaction failed to apply.
    > > > + */
    > > > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > > >
    > > > I think we need this despite resetting the origin-related variables in
    > > > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > > > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    > >
    > > There are two reasons to add a shmem-exit callback. One is to support a FATAL,
    > > another one is to support the case that user does the shutdown request while
    > > applying changes. In this case, I think ShutdownPostgres() can be called so that
    > > the session origin may advance.
    >
    > Agree that we need the 'reset' during shutdown flow as well. Details at [1]
    >
    
    Thanks for the detailed analysis. I agree with your analysis that we
    need to reset the origin information for the shutdown path to avoid it
    being advanced incorrectly. However, the patch doesn't have sufficient
    comments to explain why we need to reset it for both the ERROR and
    Shutdown paths. Can we improve the comments in the patch?
    
    Also, for the ERROR path, can we reset the origin information in
    apply_error_callback()?
    
    BTW, this needs to be backpatched till 16 when it has been introduced
    by the parallel apply feature as part of commit 216a7848. So, can we
    test this patch in back-branches as well?
    
    -- 
    With Regards,
    Amit Kapila.
    
    
    
    
  12. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    shveta malik <shveta.malik@gmail.com> — 2024-08-14T04:55:56Z

    On Tue, Aug 13, 2024 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Mon, Aug 12, 2024 at 3:37 PM shveta malik <shveta.malik@gmail.com> wrote:
    > >
    > > On Fri, Aug 9, 2024 at 2:39 PM Hayato Kuroda (Fujitsu)
    > > <kuroda.hayato@fujitsu.com> wrote:
    > > >
    > > >
    > > > > + /*
    > > > > + * Register a callback to reset the origin state before aborting the
    > > > > + * transaction in ShutdownPostgres(). This is to prevent the advancement
    > > > > + * of origin progress if the transaction failed to apply.
    > > > > + */
    > > > > + before_shmem_exit(replorigin_reset, (Datum) 0);
    > > > >
    > > > > I think we need this despite resetting the origin-related variables in
    > > > > PG_CATCH block to handle FATAL error cases, right? If so, can we use
    > > > > PG_ENSURE_ERROR_CLEANUP() instead of PG_CATCH()?
    > > >
    > > > There are two reasons to add a shmem-exit callback. One is to support a FATAL,
    > > > another one is to support the case that user does the shutdown request while
    > > > applying changes. In this case, I think ShutdownPostgres() can be called so that
    > > > the session origin may advance.
    > >
    > > Agree that we need the 'reset' during shutdown flow as well. Details at [1]
    > >
    >
    > Thanks for the detailed analysis. I agree with your analysis that we
    > need to reset the origin information for the shutdown path to avoid it
    > being advanced incorrectly. However, the patch doesn't have sufficient
    > comments to explain why we need to reset it for both the ERROR and
    > Shutdown paths. Can we improve the comments in the patch?
    >
    > Also, for the ERROR path, can we reset the origin information in
    > apply_error_callback()?
    
    Please find v4 attached. Addressed comments in that.
    
    Manual testing done on v4:
    1) Error and Fatal case
    2) Shutdown after replorigin_session_origin_lsn was set in
    apply_handle_prepare() and before EndPrepare was called.
        2a) with log_min_messages=debug5. This will result in processing
    of shutdown signal by errfinish() before PREPARE is over.
        2b) with default log_min_messages. This will result in processing
    of shutdown signal by LogicalRepApplyLoop() after ongoing PREPARE is
    over.
    
    >
    > BTW, this needs to be backpatched till 16 when it has been introduced
    > by the parallel apply feature as part of commit 216a7848. So, can we
    > test this patch in back-branches as well?
    >
    
    Sure, will do next.
    
    thanks
    Shveta
    
  13. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    shveta malik <shveta.malik@gmail.com> — 2024-08-14T09:23:25Z

    On Tue, Aug 13, 2024 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > BTW, this needs to be backpatched till 16 when it has been introduced
    > by the parallel apply feature as part of commit 216a7848. So, can we
    > test this patch in back-branches as well?
    >
    
    I was able to reproduce the problem on REL_16_STABLE and REL_17_STABLE
    through both the flows (shutdown and apply-error). The patch v4 fixes
    the issues on both.
    
    thanks
    Shveta
    
    
    
    
  14. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2024-08-20T06:06:45Z

    On Wed, Aug 14, 2024 at 10:26 AM shveta malik <shveta.malik@gmail.com> wrote:
    >
    > >
    > > Thanks for the detailed analysis. I agree with your analysis that we
    > > need to reset the origin information for the shutdown path to avoid it
    > > being advanced incorrectly. However, the patch doesn't have sufficient
    > > comments to explain why we need to reset it for both the ERROR and
    > > Shutdown paths. Can we improve the comments in the patch?
    > >
    > > Also, for the ERROR path, can we reset the origin information in
    > > apply_error_callback()?
    >
    > Please find v4 attached. Addressed comments in that.
    >
    
    The patch looks mostly good to me. I have slightly changed a few of
    the comments in the attached. What do you think of the attached?
    
    -- 
    With Regards,
    Amit Kapila.
    
  15. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    shveta malik <shveta.malik@gmail.com> — 2024-08-20T08:30:53Z

    On Tue, Aug 20, 2024 at 11:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    >
    > On Wed, Aug 14, 2024 at 10:26 AM shveta malik <shveta.malik@gmail.com> wrote:
    > >
    > > >
    > > > Thanks for the detailed analysis. I agree with your analysis that we
    > > > need to reset the origin information for the shutdown path to avoid it
    > > > being advanced incorrectly. However, the patch doesn't have sufficient
    > > > comments to explain why we need to reset it for both the ERROR and
    > > > Shutdown paths. Can we improve the comments in the patch?
    > > >
    > > > Also, for the ERROR path, can we reset the origin information in
    > > > apply_error_callback()?
    > >
    > > Please find v4 attached. Addressed comments in that.
    > >
    >
    > The patch looks mostly good to me. I have slightly changed a few of
    > the comments in the attached. What do you think of the attached?
    >
    
    Looks good to me. Please find backported patches attached.
    
    thanks
    Shveta
    
  16. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2024-08-21T09:57:46Z

    On Tue, Aug 20, 2024 at 2:01 PM shveta malik <shveta.malik@gmail.com> wrote:
    >
    > On Tue, Aug 20, 2024 at 11:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > >
    > > On Wed, Aug 14, 2024 at 10:26 AM shveta malik <shveta.malik@gmail.com> wrote:
    > > >
    > > > >
    > > > > Thanks for the detailed analysis. I agree with your analysis that we
    > > > > need to reset the origin information for the shutdown path to avoid it
    > > > > being advanced incorrectly. However, the patch doesn't have sufficient
    > > > > comments to explain why we need to reset it for both the ERROR and
    > > > > Shutdown paths. Can we improve the comments in the patch?
    > > > >
    > > > > Also, for the ERROR path, can we reset the origin information in
    > > > > apply_error_callback()?
    > > >
    > > > Please find v4 attached. Addressed comments in that.
    > > >
    > >
    > > The patch looks mostly good to me. I have slightly changed a few of
    > > the comments in the attached. What do you think of the attached?
    > >
    >
    > Looks good to me. Please find backported patches attached.
    >
    
    Pushed.
    
    -- 
    With Regards,
    Amit Kapila.
    
    
    
    
  17. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Tom Lane <tgl@sss.pgh.pa.us> — 2024-11-07T19:23:47Z

    Amit Kapila <amit.kapila16@gmail.com> writes:
    > On Tue, Aug 20, 2024 at 2:01 PM shveta malik <shveta.malik@gmail.com> wrote:
    >> Looks good to me. Please find backported patches attached.
    
    > Pushed.
    
    I came across this commit while preparing release notes, and I'm
    worried about whether it doesn't create more problems than it solves.
    The intent stated in the thread subject is to prevent an apply worker
    from advancing past a prepared transaction if the subscriber side
    doesn't permit prepared transactions.  However, it appears to me that
    the committed patch doesn't permit an apply worker to advance past
    any failing transaction whatsoever.  Was any thought given to how
    a DBA would get out of such a situation and get replication flowing
    again?  In the prepared-xact case, it's at least clear that you
    could increase max_prepared_transactions and restart the subscriber
    installation.  In the general case, it's not very obvious that you'd
    even know what the problem is let alone have an easy way to fix it.
    
    In other words: I thought the original design here was to
    intentionally ignore apply errors and keep going, on the theory that
    that was better than blocking replication altogether.  This commit
    has reversed that decision, on the strength of little or no
    discussion AFAICS.  Are we really ready to push this into minor
    releases of stable branches?  Is it a good idea even on HEAD?
    
    			regards, tom lane
    
    
    
    
  18. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2024-11-08T03:05:38Z

    On Fri, Nov 8, 2024 at 12:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
    >
    > I came across this commit while preparing release notes, and I'm
    > worried about whether it doesn't create more problems than it solves.
    > The intent stated in the thread subject is to prevent an apply worker
    > from advancing past a prepared transaction if the subscriber side
    > doesn't permit prepared transactions.  However, it appears to me that
    > the committed patch doesn't permit an apply worker to advance past
    > any failing transaction whatsoever.  Was any thought given to how
    > a DBA would get out of such a situation and get replication flowing
    > again?  In the prepared-xact case, it's at least clear that you
    > could increase max_prepared_transactions and restart the subscriber
    > installation.  In the general case, it's not very obvious that you'd
    > even know what the problem is let alone have an easy way to fix it.
    >
    
    This is by design, so we don't let the apply worker proceed in case of
    any ERROR. For example, the apply worker keeps retrying to apply the
    transaction when there is a unique key violation error while applying
    (which could be due to the subscriber side having a unique key defined
    but the publisher doesn't or the subscriber already has a row with the
    same value). We need manual intervention to continue the replication.
    To do that, she can create a subscription with the option
    'disable_on_error'. Then apply worker will stop on ERROR instead of
    retrying. Then, she can either manually remove a conflicting row or
    use ALTER SUBSCRIPTION ... SKIP ... to get past the conflicting/error
    transaction. Alternatively, the transaction can also be skipped by
    calling the pg_replication_origin_advance() function. To use SKIP or
    origin_advance function, in the ERROR log we print the LSN (CONTEXT:
    processing remote data for replication origin "pg_16395" during
    "INSERT" for replication target relation "public.test" in transaction
    725 finished at 0/14C0378). She needs to use LSN value 0/14C0378 to
    skip the error transaction. We have explained this in the docs [1].
    
    > In other words: I thought the original design here was to
    > intentionally ignore apply errors and keep going, on the theory that
    > that was better than blocking replication altogether
    >
    
    No, that was not the intention because otherwise, we will silently
    create inconsistency on the subscriber side.
    
    .  This commit
    > has reversed that decision, on the strength of little or no
    > discussion AFAICS.  Are we really ready to push this into minor
    > releases of stable branches?  Is it a good idea even on HEAD?
    >
    
    I hope the explanation above addresses your concern.
    
    [1] - https://www.postgresql.org/docs/devel/logical-replication-conflicts.html
    
    -- 
    With Regards,
    Amit Kapila.
    
    
    
    
  19. RE: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2025-12-22T09:01:19Z

    Hi,
    
    When reviewing some parallel apply related codes, I noticed a bug in the
    parallel apply worker, similar to the issue discussed in this thread.
    
    The issue is that the logical replication parallel apply worker may erroneously
    advance the origin progress during an error or unsuccessful apply. This can lead
    to transaction loss, as these transactions will not be resent by the server.
    Commit 3f28b2fc addressed a similar issue in both the apply worker and table
    sync worker.
    
    The original fix involved registering a before_shmem_exit callback to reset the
    origin information, preventing the worker from advancing it during transaction
    abortion on shutdown. The attached patch registers the same callback for the
    parallel apply worker, ensuring consistent behavior across all workers.
    
    Best Regards,
    Hou zj
    
  20. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Chao Li <li.evan.chao@gmail.com> — 2025-12-22T09:28:37Z

    
    > On Dec 22, 2025, at 17:01, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
    > 
    > Hi,
    > 
    > When reviewing some parallel apply related codes, I noticed a bug in the
    > parallel apply worker, similar to the issue discussed in this thread.
    > 
    > The issue is that the logical replication parallel apply worker may erroneously
    > advance the origin progress during an error or unsuccessful apply. This can lead
    > to transaction loss, as these transactions will not be resent by the server.
    > Commit 3f28b2fc addressed a similar issue in both the apply worker and table
    > sync worker.
    > 
    > The original fix involved registering a before_shmem_exit callback to reset the
    > origin information, preventing the worker from advancing it during transaction
    > abortion on shutdown. The attached patch registers the same callback for the
    > parallel apply worker, ensuring consistent behavior across all workers.
    > 
    > Best Regards,
    > Hou zj
    > <v1-0001-Fix-unexpected-origin-advancement-during-parallel.patch>
    
    So, ParallelApplyWorkerMain() only calls InitializeLogRepWorker() but SetupApplyOrSyncWorker(), by moving before_shmem_exit() into InitializeLogRepWorker(), ParallelApplyWorkerMain()() gets the exit callback.
    
    LGTM.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  21. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Chao Li <li.evan.chao@gmail.com> — 2025-12-22T11:09:27Z

    
    > On Dec 22, 2025, at 17:28, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > 
    > 
    >> On Dec 22, 2025, at 17:01, Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
    >> 
    >> Hi,
    >> 
    >> When reviewing some parallel apply related codes, I noticed a bug in the
    >> parallel apply worker, similar to the issue discussed in this thread.
    >> 
    >> The issue is that the logical replication parallel apply worker may erroneously
    >> advance the origin progress during an error or unsuccessful apply. This can lead
    >> to transaction loss, as these transactions will not be resent by the server.
    >> Commit 3f28b2fc addressed a similar issue in both the apply worker and table
    >> sync worker.
    >> 
    >> The original fix involved registering a before_shmem_exit callback to reset the
    >> origin information, preventing the worker from advancing it during transaction
    >> abortion on shutdown. The attached patch registers the same callback for the
    >> parallel apply worker, ensuring consistent behavior across all workers.
    >> 
    >> Best Regards,
    >> Hou zj
    >> <v1-0001-Fix-unexpected-origin-advancement-during-parallel.patch>
    > 
    > So, ParallelApplyWorkerMain() only calls InitializeLogRepWorker() but SetupApplyOrSyncWorker(), by moving before_shmem_exit() into InitializeLogRepWorker(), ParallelApplyWorkerMain()() gets the exit callback.
    > 
    > LGTM.
    > 
    
    I just noticed a nitpick while reviewing the other your patch:
    ```
    +	 * avoid origin advancement for an in-complete transaction which could
    ```
    
    “In-complete” should be just “incomplete”.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  22. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2025-12-23T04:21:15Z

    On Mon, Dec 22, 2025 at 2:31 PM Zhijie Hou (Fujitsu)
    <houzj.fnst@fujitsu.com> wrote:
    >
    > When reviewing some parallel apply related codes, I noticed a bug in the
    > parallel apply worker, similar to the issue discussed in this thread.
    >
    > The issue is that the logical replication parallel apply worker may erroneously
    > advance the origin progress during an error or unsuccessful apply. This can lead
    > to transaction loss, as these transactions will not be resent by the server.
    > Commit 3f28b2fc addressed a similar issue in both the apply worker and table
    > sync worker.
    >
    > The original fix involved registering a before_shmem_exit callback to reset the
    > origin information, preventing the worker from advancing it during transaction
    > abortion on shutdown. The attached patch registers the same callback for the
    > parallel apply worker, ensuring consistent behavior across all workers.
    >
    
    Thanks for reporting the issue and patch.
    
    +# Test the ability to re-apply a transaction when a parallel apply worker fails
    +# to prepare the transaction due to insufficient max_prepared_transactions
    +# setting.
    +$node_subscriber->append_conf('postgresql.conf',
    
    How does the test ensure that error is raised by parallel apply
    worker? I see that in the previous test, we set
    'debug_logical_replication_streaming = immediate', so that should help
    to invoke parallel apply worker. But is there a more direct way to
    ensure the same? Can we test for LOG like: "ERROR:  logical
    replication parallel apply worker exited due to error"?
    
    -- 
    With Regards,
    Amit Kapila.
    
    
    
    
  23. RE: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> — 2025-12-23T06:42:11Z

    On Tuesday, December 23, 2025 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
    > 
    > On Mon, Dec 22, 2025 at 2:31 PM Zhijie Hou (Fujitsu)
    > <houzj.fnst@fujitsu.com> wrote:
    > >
    > > When reviewing some parallel apply related codes, I noticed a bug in
    > > the parallel apply worker, similar to the issue discussed in this thread.
    > >
    > > The issue is that the logical replication parallel apply worker may
    > > erroneously advance the origin progress during an error or
    > > unsuccessful apply. This can lead to transaction loss, as these transactions
    > will not be resent by the server.
    > > Commit 3f28b2fc addressed a similar issue in both the apply worker and
    > > table sync worker.
    > >
    > > The original fix involved registering a before_shmem_exit callback to
    > > reset the origin information, preventing the worker from advancing it
    > > during transaction abortion on shutdown. The attached patch registers
    > > the same callback for the parallel apply worker, ensuring consistent
    > behavior across all workers.
    > >
    > 
    > Thanks for reporting the issue and patch.
    > 
    > +# Test the ability to re-apply a transaction when a parallel apply
    > +worker fails # to prepare the transaction due to insufficient
    > +max_prepared_transactions # setting.
    > +$node_subscriber->append_conf('postgresql.conf',
    > 
    > How does the test ensure that error is raised by parallel apply worker? I see
    > that in the previous test, we set 'debug_logical_replication_streaming =
    > immediate', so that should help to invoke parallel apply worker. But is there a
    > more direct way to ensure the same? Can we test for LOG like: "ERROR:
    > logical replication parallel apply worker exited due to error"?
    
    OK, I have added a general log test for "ERROR .. logical replication parallel
    apply worker ..." to ensure that it's the parallel apply worker that failed to
    apply the transaction.
    
    And I also addressed the comments by Li Chao.
    
    Here are the updated patches for all branches.
    
    Best Regards,
    Hou zj
    
    
    
  24. Re: [bug fix] prepared transaction might be lost when max_prepared_transactions is zero on the subscriber

    Amit Kapila <amit.kapila16@gmail.com> — 2025-12-24T06:28:21Z

    On Tue, Dec 23, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
    <houzj.fnst@fujitsu.com> wrote:
    >
    > Here are the updated patches for all branches.
    >
    
    Pushed.
    
    -- 
    With Regards,
    Amit Kapila.