Thread

  1. Re: failed NUMA pages inquiry status: Operation not permitted

    Tomas Vondra <tomas@vondra.me> — 2025-12-16T15:17:51Z

    On 12/16/25 15:48, Christoph Berg wrote:
    > Re: To Tomas Vondra
    >> I've managed to reproduce it once, running this loop on
    >> 18-as-of-today. It errored out after a few 100 iterations:
    >>
    >> while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done
    >>
    >> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR:  invalid NUMA node id outside of allowed range [0, 0]: -2
    >> 2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT:  SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa
    >>
    >> That was on the apt.pg.o amd64 build machine while a few things were
    >> just building. Maybe ENOENT "The page is not present" means something
    >> was just swapped out because the machine was under heavy load.
    > 
    > I played a bit more with it.
    > 
    > * It seems to trigger only once for a running cluster. The next one
    >   needs a restart
    > * If it doesn't trigger within the first 30s, it probably never will
    > * It seems easier to trigger on a system that is under load (I started
    >   a few pgmodeler compile runs in parallel (C++))
    > 
    > But none of that answers the "why".
    > 
    
    Hmmm, so this is interesting. I tried this on my workstation (with a
    single NUMA node), and I see this:
    
    1) right after opening a connection, I get this
    
    test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
     numa_node | count
    -----------+-------
             0 |   290
            -2 | 32478
    (2 rows)
    
    
    2) but a select from pg_shmem_allocations_numa works fine
    
    test=# select numa_node, count(*) from pg_shmem_allocations_numa group by 1;
     numa_node | count
    -----------+-------
             0 |    72
    (1 row)
    
    
    3) and if I repeat the pg_buffercache_numa query, it now works
    
    test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
     numa_node | count
    -----------+-------
             0 | 32768
    (1 row)
    
    
    That's a bit strange. I have no idea why is this happening. If I
    reconnect, I start getting the failures again.
    
    
    regards
    
    -- 
    Tomas Vondra