Thread

  1. Re: [PATCH] pg_surgery: Fix WAL corruption from concurrent heap_force_kill

    Zsolt Parragi <zsolt.parragi@percona.com> — 2026-05-05T20:42:43Z

    Hello!
    
    I verified both the patch, and that the test case catches the bug
    without the patch, the overall change seems good to me.
    
    + if (did_modify_vm)
    + INJECTION_POINT_CACHED("heap-force-kill-before-vm-wal", NULL);
    +
      if (did_modify_vm && RelationNeedsWAL(rel))
    + {
      log_newpage_buffer(vmbuf, false);
    + LockBuffer(vmbuf, BUFFER_LOCK_UNLOCK);
    + }
    
    Is the additional if intentional here? Based on the name it seems like
    it could be simply part of the next if, currently it also fires for
    unlogged tables.
    
    +# Give session 2 time to reach the VM buffer lock (or complete if
    +# unfixed).  We cannot reliably detect blocking from Perl, so just
    +# sleep briefly.
    +use Time::HiRes qw(usleep);
    +usleep(500_000);
    
    Wouldn't this be possibly unstable on CI?
    
    The following seems to work for me, both for detecting the issue in
    the unpatched version and to result in a quicker continuation in the
    patched version: (~1.55sec total execution time compared to ~2sec
    original)
    
    (with the patch, s2 wait for the loc, without the patch it finishes
    and becomes idle)
    
    my $s2 = $node->background_psql('postgres');
    my $s2_pid = $s2->query_safe(q{SELECT pg_backend_pid()});
    chomp $s2_pid;
    
    $s2->query_until(qr/starting_s2/,
        q(\echo starting_s2
    SELECT heap_force_kill('test_vm'::regclass, ARRAY['(1,1)']::tid[]);
    \echo s2_done
    ));
    
    use Time::HiRes qw(usleep);
    my $observed = '';
    my $deadline = time() + 10;
    while (time() < $deadline) {
        $observed = $node->safe_psql('postgres', qq{
            SELECT format('%s/%s/%s',
                          coalesce(wait_event_type, 'NULL'),
                          coalesce(wait_event, 'NULL'),
                          coalesce(state, 'NULL'))
            FROM pg_stat_activity WHERE pid = $s2_pid
        });
        last if $observed =~ m{^Buffer/BufferExclusive/}
             || $observed =~ m{/idle$};
        usleep(50_000);
    }