Thread

  1. Bug in amcheck?

    Konstantin Knizhnik <knizhnik@garret.ru> — 2025-10-22T16:29:51Z

    Hi hackers.
    
    We see the following error reported by amcheck (I have added dump of 
    opaque) when it interleaves with autovacuum and cancel pt:
    
    
    ERROR:  mismatch between parent key and child high key in index 
    "pg_attribute_relid_attnam_index"
    DETAIL:  Target block=274, target opaque->flags=0, child block=427, 
    child opaque=11, target page lsn=1/484A8FC8.
    CONTEXT:  SQL statement "SELECT bt_index_parent_check(indexrelid, true, 
    true) from pg_index"
    
    So child has BTP_HALF_DEAD bit set.
    Autovacuum is interrupted in this place in _bt_pagedel:
    
             /*
              * Check here, as calling loops will have locks held, preventing
              * interrupts from being processed.
              */
             CHECK_FOR_INTERRUPTS();
    
    Reproducing it is not so easy.
    First of all I added sleep here:
    
             /*
              * Check here, as calling loops will have locks held, preventing
              * interrupts from being processed.
              */
             pg_usleep(10000);
             CHECK_FOR_INTERRUPTS();
    
    Then I create two procedures:
    
    create or replace procedure create_tables(tables integer, partitions 
    integer) as $$
    declare
         i integer;
         j integer;
    begin
         for i in 1..tables
         loop
             execute 'DROP TABLE IF EXISTS t_' || i;
             execute 'CREATE TABLE t_' || i || '(pk integer) partition by 
    range (pk)';
             for j in 1..partitions
             loop
                 execute 'create table p_'||i||'_'||j||' partition of 
    t_'||i||' for values from ('||j||') to ('||(j + 1)||')';
             end loop;
             execute 'insert into t_'||i||' values 
    (generate_series(1,'||partitions||'))';
         end loop;
    end;
    $$ language plpgsql;
    
    and
    
    create or replace procedure run_amcheck() as $$
    begin
         loop
             if (select count(*) from pg_stat_activity where 
    backend_type='autovacuum worker') > 0
             then
                 raise notice 'Run amcheck!';
                 perform bt_index_parent_check(indexrelid, true, true) from 
    pg_index;
             end if;
             perform pg_sleep(1);
         end loop;
    end;
    $$ language plpgsql;
    
    Then I run concurrently run_amcheck()
    and the following script for pgbench:
    
    call create_tables(2,1000);
    select pg_sleep(2);
    
    If the problem is not reproduced, then cancel run_amcheck()  and restart 
    it once again.
    
    
    Backtrace (pg16) is the following:
    
       * frame #0: 0x00000001017b6aac 
    amcheck.dylib`bt_child_highkey_check(state=0x000000010c846318, 
    target_downlinkoffnum=37, loaded_child="\U00000001", target_level=1) at 
    verify_nbtree.c:2146:23
         frame #1: 0x00000001017b7fd8 
    amcheck.dylib`bt_child_check(state=0x000000010c846318, 
    targetkey=0x000000013c01c448, downlinkoffnum=37) at verify_nbtree.c:2262:2
         frame #2: 0x00000001017b5f4c 
    amcheck.dylib`bt_target_page_check(state=0x000000010c846318) at 
    verify_nbtree.c:1623:4
         frame #3: 0x00000001017b3908 
    amcheck.dylib`bt_check_level_from_leftmost(state=0x000000010c846318, 
    level=(level = 1, leftmost = 3, istruerootlevel = false)) at 
    verify_nbtree.c:859:3
         frame #4: 0x00000001017b24e8 
    amcheck.dylib`bt_check_every_level(rel=0x0000000140074f18, 
    heaprel=0x0000000130070148, heapkeyspace=true, readonly=true, 
    heapallindexed=true, rootdescend=true) at verify_nbtree.c:603:13
         frame #5: 0x00000001017b198c 
    amcheck.dylib`bt_index_check_internal(indrelid=2674, parentcheck=true, 
    heapallindexed=true, rootdescend=true) at verify_nbtree.c:362:3
         frame #6: 0x00000001017b1a78 
    amcheck.dylib`bt_index_parent_check(fcinfo=0x000000010c83b040) at 
    verify_nbtree.c:242:2
    
    
    I wonder if we should add P_ISHALFDEAD(opaque) for child page?