Thread

  1. Re: Logical replication prefetch

    Konstantin Knizhnik <knizhnik@garret.ru> — 2025-07-13T12:29:35Z

    On 13/07/2025 9:28 am, Amit Kapila wrote:
    > I didn't understand your scenario. pa_launch_parallel_worker() should
    > spawn a new worker only if all the workers in the pool are busy, and
    > then it will free the worker if the pool already has enough workers.
    > So, do you mean to say that the workers in the pool are always busy in
    > your workload which lead spawn/exit of new workers? Can you please
    > explain your scenario in some more detail?
    >
    Current LR apply logic is not working well for applying small OLTP 
    transactions.
    First of all by default reorder buffer at publisher will buffer them and 
    so prevent parallel apply at subscriber.
    Publisher switches to streaming mode only if transaction is too large or 
    `debug_logical_replication_streaming=immediate`.
    But even if we force publisher to stream short transactions, subscriber 
    will try to launch new parallel apply worker for each transactions (if 
    all existed workers are busy).
    If there are 100 active backends at publisher, then subscriber will try 
    to launch 100 parallel apply workers.
    Most likely it fails because of limit for maximal number of workers. In 
    this case leader will serialize such transactions.
    So if there are 100 streamed transactions and 10 parallel apply workers, 
    then 10 transactions are started in parallel and 90 will be serialized 
    to disk.
    It seems to be not so efficient for short transaction. It is better to 
    wait for some time until some of workers become vacant.
    
    But the worst thing happen when parallel apply worker completes its 
    transactions. If number of parallel apply workers in pool exceeds 
    `max_parallel_apply_workers_per_subscription / 2`,
    then this parallel apply worker is terminated. So instead of having 
    `max_parallel_apply_workers_per_subscription` workers applying 
    transactions at maximal possible speed and leader
    which distributes transaction between them and stops receiving new data 
    from publisher if there is no vacant worker, we will have leader 
    serializing and writing transactions to the disk
    (and then definitely reading them from the disk) and permanently 
    starting and terminating parallel apply worker processes. It leads to 
    awful performance.
    
    
    Certainly originally intended use case was different: parallel apply is 
    performed only for large transactions. Number of of such transactions is 
    not so big and
    so there should be enough parallel apply workers in pool to proceed 
    them. And if there are not enough workers, it is not a problem to spawn 
    new one and terminate
    it after completion of transaction (because transaction is long, 
    overhead of spawning process is not so larger comparing with redo of 
    large transaction).
    But if we want to efficiently replicate OLTP workload, then we 
    definitely need some other approach.
    
    Prefetch is actually more compatible with current implementation because 
    prefetch operations don't need to be grouped by transaction and can be 
    executed by any prefetch worker.