Thread

  1. RE: Parallel Apply

    Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> — 2025-12-22T11:13:21Z

    Dear Hackers,
    
    I have been spending time for implementing the patch, and I think it's time to
    share on -hackers.
    
    Patches 0001-0004 are largely not changed; some refactoring were done.
    Now 0004 has a basic test for dependency tracking.
    
    Remained patches enhance the parallel apply feature. 0006, 0007 and 0008 contains tests.
    
    0005 was copied from [1]. The patch is needed for applying the prepared
    transactions correctly. Please post comments at [1] if you have any comments on
    it.
    
    0006 contains changes for supporting two-phase transactions in parallel.
    Parallel workers can be assigned when the BEGIN_PREPARE message comes, and
    released after the PREPARE message. As with normal non-streamed transactions,
    prepared transactions are marked as parallelized when the leader dispatches a
    PREPARE message to the parallel workers, and they are removed when the parallel
    worker finishes preparing. This allows upcoming transactions to not commit
    transactions till the parallel worker finishes the preparation.
    Same as streaming transactions, COMMIT/ROLLBACK PREPARED messages are handled by
    the leader worker. At that time, the leader waits for the last transaction
    launched to finish.
    
    0007 contains changes to track dependencies for streamed transactions.
    In streaming=on mode, dependency tracking and waiting are performed while changes
    are applied. The leader does nothing while serializing changes.
    In the case of streaming=parallel mode, we must track and wait based on
    dependencies. Basically, non-streamed transactions do not have to wait for
    streamed transactions because the leader worker always waits for them to be
    applied. In contrast, streamed transactions must wait for the lastly dispatched
    non-streamed transactions. Based on that, streamed transactions won't be marked
    as parallelized, and the XID of the streamed transaction won't be set for the
    replica identity hash entry. This means no parallel workers would wait for the
    streamed transactions. Other than that, dependency tracking is done the same as
    in a non-streaming case.
    
    0008 contains changes to track dependencies based on subscriber-local indexes.
    This extends the RI hash table to allow values to be stored based on local
    indexes. The information, which indexes are defined for the table, is gathered
    by leader, when the dependency checking for the table is firstly done in a transaction.
    The detection mechanism is mostly the same as the RI case.
    
    How do you feel?
    
    [1]: https://www.postgresql.org/message-id/TY4PR01MB169078771FB31B395AB496A6B94B4A%40TY4PR01MB16907.jpnprd01.prod.outlook.com
    [2]: https://www.postgresql.org/message-id/OS0PR01MB5716D43CB68DB8FFE73BF65D942AA%40OS0PR01MB5716.jpnprd01.prod.outlook.com
    
    Best regards,
    Hayato Kuroda
    FUJITSU LIMITED