Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Remove table_scan_analyze_next_tuple unneeded parameter OldestXmin

  2. Simplify visibility check in heap_page_would_be_all_visible()

  3. Eliminate use of cached VM value in lazy_scan_prune()

  4. Combine visibilitymap_set() cases in lazy_scan_prune()

  5. Fix const qualification in prune_freeze_setup()

  6. Simplify vacuum visibility assertion

  7. Split heap_page_prune_and_freeze() into helpers

  8. Assert that cutoffs are provided if freezing will be attempted

  9. Split PruneFreezeParams initializers to one field per line

  10. Refactor heap_page_prune_and_freeze() parameters into a struct

  11. Make heap_page_is_all_visible independent of LVRelState

  12. Inline TransactionIdFollows/Precedes[OrEquals]()

  13. Add helper for freeze determination to heap_page_prune_and_freeze

  14. Bump XLOG_PAGE_MAGIC after xl_heap_prune change

  15. Correct prune WAL record opcode name in comment

  16. Add error codes when vacuum discovers VM corruption

  17. Remove unused xl_heap_prune member, reason

  18. Remove unneeded VM pin from VM replay

  19. Add assert and log message to visibilitymap_set

  20. Add error codes to some corruption log messages

  1. eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-06-23T20:25:16Z

    Hi,
    
    The attached patch set eliminates xl_heap_visible, the WAL record
    emitted when a block of the heap is set all-visible/frozen in the
    visibility map. Instead, it includes the information needed to update
    the VM in the WAL record already emitted by the operation modifying
    the heap page.
    
    Currently COPY FREEZE and vacuum are the only operations that set the
    VM. So, this patch modifies the xl_heap_multi_insert and xl_heap_prune
    records.
    
    The result is a dramatic reduction in WAL volume for these operations.
    I've included numbers below.
    
    I also think that it makes more sense to include changes to the VM in
    the same WAL record as the changes that rendered the page all-visible.
    In some cases, we will only set the page all-visible, but that is in
    the context of the operation on the heap page which discovered that it
    was all-visible. Therefore, I find this to be a clarity as well as a
    performance improvement.
    
    This project is also the first step toward setting the VM on-access
    for queries which do not modify the page. There are a few design
    issues that must be sorted out for that project which I will detail
    separately. Note that this patch set currently does not implement
    setting the VM on-access.
    
    The attached patch set isn't 100% polished. I think some of the
    variable names and comments could use work, but I'd like to validate
    the idea of doing this before doing a full polish. This is a summary
    of what is in the set:
    
    Patches:
    0001 - 0002: cleanup
    0003 - 0004: refactoring
    0005: COPY FREEZE changes
    0006: refactoring
    0007: vacuum phase III changes
    0008: vacuum phase I empty page changes
    0009 - 0012: refactoring
    0013: vacuum phase I normal page changes
    0014: cleanup
    
    Performance benefits of eliminating xl_heap_visible:
    
    vacuum of table with index (DDL at bottom of email)
    --
    master -> patch
    WAL bytes: 405346 -> 303088 = 25% reduction
    WAL records: 6682 -> 4459 = 33% reduction
    
    vacuum of table without index
    --
    master -> patch
    WAL records: 4452 -> 2231 = 50% reduction
    WAL bytes: 289016 -> 177978 = 38% reduction
    
    COPY FREEZE of table without index
    --
    master -> patch
    WAL records: 3672777 -> 1854589 = 50% reduction
    WAL bytes: 841340339 -> 748545732  = 11% reduction (new pages need a
    copy of the whole page)
    
    table for vacuum example:
    --
    create table foo(a int, b numeric, c numeric) with (autovacuum_enabled= false);
    insert into foo select i % 18, repeat('1', 400)::numeric, repeat('2',
    400)::numeric from generate_series(1,40000)i;
    -- don't make index for no-index case
    create index on foo(a);
    delete from foo where a = 1;
    vacuum (verbose, process_toast false) foo;
    
    
    copy freeze example:
    --
    -- create a data file
    create table large(a int, b int) with (autovacuum_enabled = false,
    fillfactor = 10);
    insert into large SELECT generate_series(1,40000000)i, 1;
    copy large to 'large.data';
    
    -- example
    BEGIN;
    create table large(a int, b int) with (autovacuum_enabled = false,
    fillfactor = 10);
    COPY large FROM 'large.data' WITH (FREEZE);
    COMMIT;
    
    - Melanie
    
  2. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-06-26T22:04:34Z

    On Mon, Jun 23, 2025 at 4:25 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > The attached patch set eliminates xl_heap_visible, the WAL record
    > emitted when a block of the heap is set all-visible/frozen in the
    > visibility map. Instead, it includes the information needed to update
    > the VM in the WAL record already emitted by the operation modifying
    > the heap page.
    
    Rebased in light of recent changes on master:
    
    0001: cleanup
    0002: preparatory work
    0003: eliminate xl_heap_visible for COPY FREEZE
    0004 - 0005: eliminate xl_heap_visible for vacuum's phase III
    0006: eliminate xl_heap_visible for vacuum phase I empty pages
    0007 - 0010: preparatory refactoring
    0011: eliminate xl_heap_visible from vacuum phase I prune/freeze
    0012: remove xl_heap_visible
    
    - Melanie
    
  3. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-07-09T21:59:26Z

    On Thu, Jun 26, 2025 at 6:04 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > Rebased in light of recent changes on master:
    
    This needed another rebase, and, in light of the discussion in [1],
    I've also removed the patch to add heap wrappers for setting pages
    all-visible.
    
    More notably, the final patch (0012) in attached v3 allows on-access
    pruning to set the VM.
    
    To do this, it plumbs some information down from the executor to the
    table scan about whether or not the table is modified by the query. We
    don't want to set the VM only to clear it while scanning pages for an
    UPDATE or while locking rows in a SELECT FOR UPDATE.
    
    Because we only do on-access pruning when pd_prune_xid is valid, we
    shouldn't need much of a heuristic for deciding when to set the VM
    on-access -- but I've included one anyway: we only do it if we are
    actually pruning or if the page is already dirty and no FPI would be
    emitted.
    
    You can see it in action with the following:
    
    create extension pg_visibility;
    create table foo (a int, b int) with (autovacuum_enabled=false, fillfactor=90);
    insert into foo select generate_series(1,300), generate_series(1,300);
    create index on foo (a);
    update foo set b = 51 where b = 50;
    select * from foo where a = 50;
    select * from pg_visibility_map_summary('foo');
    
    The SELECT will set a page all-visible in the VM.
    In this patch set, on-access pruning is enabled for sequential scans
    and the underlying heap relation in index scans and bitmap heap scans.
    This example can exercise any of the three if you toggle
    enable_indexscan and enable_bitmapscan appropriately.
    
    From a performance perspective, If you run a trivial pgbench, you can
    see far more all-visible pages set in the pgbench_[x] relations with
    no noticeable overhead. But, I'm planning to do some performance
    experiments to show how this affects our ability to choose index only
    scan plans in realistic workloads.
    
    - Melanie
    
    [1] https://www.postgresql.org/message-id/CAAKRu_Yj%3DyrL%2BgGGsqfYVQcYn7rDp6hDeoF1vN453JDp8dEY%2Bw%40mail.gmail.com
    
  4. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-07-11T22:19:15Z

    On Wed, Jul 9, 2025 at 5:59 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > On Thu, Jun 26, 2025 at 6:04 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > >
    > > Rebased in light of recent changes on master:
    >
    > This needed another rebase, and, in light of the discussion in [1],
    > I've also removed the patch to add heap wrappers for setting pages
    > all-visible.
    
    Andrey Borodin made the excellent point off-list that I forgot to
    remove the xl_heap_visible struct itself -- which is rather important
    to a patch set purporting to eliminate xl_heap_visible! New version
    attached.
    
    
    - Melanie
    
  5. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Andrey Borodin <x4mmm@yandex-team.ru> — 2025-07-13T18:34:26Z

    
    > On 12 Jul 2025, at 03:19, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > 
    > remove the xl_heap_visible struct
    
    Same goes for VISIBILITYMAP_XLOG_CATALOG_REL and XLOG_HEAP2_VISIBLE. But please do not rush to remove it, perhaps I will have a more exhaustive list later. Currently the patch set is expected to be unpolished.
    I just need to absorb all effects to have a high-level evaluation of the patch set effect.
    
    I'm still trying to grasp connection of first patch with Assert(prstate->cutoffs) to other patches;
    
    Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.
    
    So far I do not see any general problems in delegating redo work from xl_heap_visible to other record. FWIW I observed several cases of VM corruptions that might be connected to the fact that we log VM changes independently of data changes that caused VM to change. But I have no real evidence or understanding what happened.
    
    
    Best regards, Andrey Borodin.
    
    
    
  6. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-07-13T19:15:22Z

    On Sun, Jul 13, 2025 at 2:34 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
    >
    > > On 12 Jul 2025, at 03:19, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > >
    > > remove the xl_heap_visible struct
    >
    > Same goes for VISIBILITYMAP_XLOG_CATALOG_REL and XLOG_HEAP2_VISIBLE. But please do not rush to remove it, perhaps I will have a more exhaustive list later. Currently the patch set is expected to be unpolished.
    > I just need to absorb all effects to have a high-level evaluation of the patch set effect.
    
    I actually did remove those if you check the last version posted. I
    did notice there is one remaining comment referring to
    XLOG_HEAP2_VISIBLE I missed somehow, but the actual enums/macros were
    removed already.
    
    > I'm still trying to grasp connection of first patch with Assert(prstate->cutoffs) to other patches;
    
    I added this because I noticed that it was used without validating it
    was provided in that location. The last patch in the set which sets
    the VM on access changes where cutoffs are used, so I noticed what I
    felt was a missing assert in master while developing that page.
    
    > Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.
    
    Could you clarify what you mean by this? Are you talking about the
    string representation of the visibility map bits in the WAL record
    representations in heapdesc.c?
    
    - Melanie
    
    
    
    
  7. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Andrey Borodin <x4mmm@yandex-team.ru> — 2025-07-14T06:37:47Z

    
    > On 14 Jul 2025, at 00:15, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > 
    >> 
    >> Also, I'd prefer "page is not marked all-visible but visibility map bit is set in relation" to emit XX001 for monitoring reasons, but again, this is small note, while I need a broader picture.
    > 
    > Could you clarify what you mean by this? Are you talking about the
    > string representation of the visibility map bits in the WAL record
    > representations in heapdesc.c?
    
    This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.
    
    If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.
    
    Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.
    
    To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).
    
    
    Best regards, Andrey Borodin.
    
    
    
  8. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-07-31T22:58:11Z

    Thanks for continuing to take a look, Andrey.
    
    On Mon, Jul 14, 2025 at 2:37 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
    >
    > This might be a bit off-topic for this thread, but as long as the patch touches that code we can look into this too.
    >
    > If VM bit all-visible is set while page is not all-visible IndexOnlyScan will show incorrect results. I observed this inconsistency few times on production.
    
    That's very unfortunate. I wonder what could be causing this. Do you
    suspect a bug in Postgres? Or something wrong with the disk, etc?
    
    > Two persistent subsystems (VM and heap) contradict each other, that's why I think this is a data corruption. Yes, we can repair the VM by assuming heap to be the source of truth in this case. But we must also emit ERRCODE_DATA_CORRUPTED XX001 code into the logs. In many cases this will alert on-call SRE.
    >
    > To do so I propose to replace elog(WARNING,...) with ereport(WARNING,(errcode(ERRCODE_DATA_CORRUPTED),..).
    
    Ah, you mean the warnings currently in lazy_scan_prune(). To me this
    suggestion makes sense. I see at least one other example with
    ERRCODE_DATA_CORRUPTED that is an error level below ERROR.
    
    I have attached a cleaned up and updated version of the patch set (it
    doesn't yet include your suggested error message change).
    
    
    What's new in this version
    -----
    In addition to general code, comment, and commit message improvements,
    notable changes are as follows:
    
    - I have used the GlobalVisState for determining if the whole page is
    visible in a more natural way.
    
    - I micro-benchmarked and identified some sources of regression in the
    additional code SELECT queries would do to set the VM. So, there are
    several new commits addressing these (for example inlining several
    functions and unsetting all-visible when we see a dead tuple if we
    won't attempt freezing).
    
    - Because heap_page_prune_and_freeze() was getting long, I added some
    helper functions.
    
    
    Performance impact of setting the VM on-access
    -------
    I found that with the patch set applied, we set many pages all-visible
    in the VM on access, resulting in a higher overall number of pages set
    all-visible, reducing load for vacuum, and dramatically decreasing
    heap fetches by index-only scans.
    
    I devised a simple benchmark -- with 8 workers inserting 20 rows at a
    time into a table with a few columns and updating a single row that
    they just inserted. Another worker queries the table 1x second using
    an index.
    
    After running the benchmark for a few minutes, though the table was
    autovacuumed several times in both cases, with the patchset applied,
    15% more blocks were all-visible at the end of the benchmark.
    
    And with my patch applied, index-only scans did far fewer heap
    fetches. A SELECT count(*) of the table at the same point in the
    benchmark did 10,000 heap fetches on master and 500 with the patch
    applied (I used auto_explain to determine this).
    
    With my patch applied, autovacuum workers write half as much WAL as on
    master. Some of this is courtesy of other patches in the set which
    eliminate separate WAL records for setting the page all-visible. But,
    vacuum is also scanning fewer pages and dirtying fewer buffers because
    they are being set all-visible on-access.
    
    There are more details about the benchmark at the end of the email.
    
    
    Setting pd_prune_xid on insert
    ------
    The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
    patch in the set. It sets pd_prune_xid on insert (so pages filled by
    COPY or insert can also be set all-visible in the VM before they are
    vacuumed). I gave it a .txt extension because it currently fails
    035_standby_logical_decoding due to a recovery conflict. I need to
    investigate more to see if this is a bug in my patch set or elsewhere
    in Postgres.
    
    Besides the failing test, I have a feeling that my current heuristic
    for whether or not to set the VM on-access is not quite right for
    pages that have only been inserted to -- and if we get it wrong, we've
    wasted those CPU cycles because we didn't otherwise need to prune the
    page.
    
    
    - Melanie
    
    
    Benchmark
    -------
    psql -c "
    DROP TABLE IF EXISTS simple_table;
    
    CREATE TABLE simple_table (
        id SERIAL PRIMARY KEY,
        group_id INT NOT NULL,
        data TEXT,
        created_at TIMESTAMPTZ DEFAULT now()
    );
    
    create index on simple_table(group_id);
    "
    
    pgbench \
      --no-vacuum \
      --random-seed=0 \
      -c 8 \
      -j 8 \
      -M prepared \
      -T 200 \
      > "pgbench_run_summary_update_${version}" \
    -f- <<EOF &
    \set gid random(1,1000)
    
    INSERT INTO simple_table (group_id, data)
      SELECT :gid, 'inserted'
      RETURNING id \gset
    
    update simple_table set data = 'updated' where id = :id;
    
    insert into simple_table (group_id, data)
      select :gid, 'inserted'
      from generate_series(1,20);
    EOF
    insert_pid=$!
    
    pgbench \
      --no-vacuum \
      --random-seed=0 \
      -c 1 \
      -j 1 \
      --rate=1 \
      -M prepared \
      -T 200 \
      > "pgbench_run_summary_select_${version}" \
    -f- <<EOF
    \set gid random(1, 1000)
    select max(created_at) from simple_table where group_id = :gid;
    select count(*) from simple_table where group_id = :gid;
    EOF
    
    wait $insert_pid
    
  9. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-08-01T21:36:19Z

    On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
    > patch in the set. It sets pd_prune_xid on insert (so pages filled by
    > COPY or insert can also be set all-visible in the VM before they are
    > vacuumed). I gave it a .txt extension because it currently fails
    > 035_standby_logical_decoding due to a recovery conflict. I need to
    > investigate more to see if this is a bug in my patch set or elsewhere
    > in Postgres.
    
    I figured out that if we set the VM on-access, we need to enable
    hot_standby_feedback in more places in 035_standby_logical_decoding.pl
    to avoid recovery conflicts. I've done that in the attached updated
    version 6. There are a few other issues in
    035_standby_logical_decoding.pl that I reported here [1]. With these
    changes, setting pd_prune_xid on insert passes tests. Whether or not
    we want to do it (and what the heuristic should be for deciding when
    to do it) is another question.
    
    - Melanie
    
    [1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%40mail.gmail.com
    
  10. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-08-26T09:58:28Z

    On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <melanieplageman@gmail.com> wrote:
    >
    > On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > >
    > > The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
    > > patch in the set. It sets pd_prune_xid on insert (so pages filled by
    > > COPY or insert can also be set all-visible in the VM before they are
    > > vacuumed). I gave it a .txt extension because it currently fails
    > > 035_standby_logical_decoding due to a recovery conflict. I need to
    > > investigate more to see if this is a bug in my patch set or elsewhere
    > > in Postgres.
    >
    > I figured out that if we set the VM on-access, we need to enable
    > hot_standby_feedback in more places in 035_standby_logical_decoding.pl
    > to avoid recovery conflicts. I've done that in the attached updated
    > version 6. There are a few other issues in
    > 035_standby_logical_decoding.pl that I reported here [1]. With these
    > changes, setting pd_prune_xid on insert passes tests. Whether or not
    > we want to do it (and what the heuristic should be for deciding when
    > to do it) is another question.
    >
    > - Melanie
    >
    > [1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%40mail.gmail.com
    
    Hi!
    
    Andrey told me off-list about this thread and I decided to take a look.
    
    I tried to play with each patch in this patchset and find a
    corruption, but I was unsuccessful. I will conduct further tests
    later. I am not implying that I suspect this patchset causes any
    corruption; I am merely attempting to verify it.
    
    I also have few comments and questions. Here is my (very limited)
    review of 0001, because I believe that removing xl_heap_visible from
    COPY FREEZE is pure win, so this patch can be very beneficial by
    itself.
    
    visibilitymap_set_vmbyte is introduced in 0001 and removed in 0012.
    This is strange to me, maybe we can avoid visibilitymap_set_vmbyte in
    first place?
    
    In 0001:
    
    1)
    should we add "Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
    LW_EXCLUSIVE));" in visibilitymap_set_vmbyte?
    
    Also here  `Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer),
    vmbuffer));` can be beneficial:
    
    >/*
    >+ * If we're only adding already frozen rows to a previously empty
    >+ * page, mark it as all-frozen and update the visibility map. We're
    >+ * already holding a pin on the vmbuffer.
    >+ */
    >   else if (all_frozen_set)
    >+ {
    >    PageSetAllVisible(page);
    >+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    >+ visibilitymap_set_vmbyte(relation,
    >+ BufferGetBlockNumber(buffer),
    >+ vmbuffer,
    >+ VISIBILITYMAP_ALL_VISIBLE |
    >+ VISIBILITYMAP_ALL_FROZEN);
    >+ }
    
    2)
    in heap_xlog_multi_insert:
    
    +
    + visibilitymap_pin(reln, blkno, &vmbuffer);
    + visibilitymap_set_vmbyte(....)
    
    Do we need to pin vmbuffer here? Looks like
    XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
    with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
    and COPY ... WITH (FREEZE true) test.
    
    3)
    >+
    > +#ifdef TRACE_VISIBILITYMAP
    > + elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
    > +#endif
    
    I can see this merely copy-pasted from visibilitymap_set, but maybe
    display "flags" also?
    
    4) visibilitymap_set receives  XLogRecPtr recptr parameters, which is
    set to WAL record lsn during recovery and to InvalidXLogRecPtr
    otherwise. visibilitymap_set manages VM page LSN bases on this recptr
    value (inside function logic). visibilitymap_set_vmbyte behaves
    vise-versa and makes its caller responsible for page LSN management.
    Maybe we should keep these two functions akin to each other?
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  11. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-08-26T20:01:24Z

    On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <melanieplageman@gmail.com> wrote:
    >
    > On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > >
    > > The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
    > > patch in the set. It sets pd_prune_xid on insert (so pages filled by
    > > COPY or insert can also be set all-visible in the VM before they are
    > > vacuumed). I gave it a .txt extension because it currently fails
    > > 035_standby_logical_decoding due to a recovery conflict. I need to
    > > investigate more to see if this is a bug in my patch set or elsewhere
    > > in Postgres.
    >
    > I figured out that if we set the VM on-access, we need to enable
    > hot_standby_feedback in more places in 035_standby_logical_decoding.pl
    > to avoid recovery conflicts. I've done that in the attached updated
    > version 6. There are a few other issues in
    > 035_standby_logical_decoding.pl that I reported here [1]. With these
    > changes, setting pd_prune_xid on insert passes tests. Whether or not
    > we want to do it (and what the heuristic should be for deciding when
    > to do it) is another question.
    >
    > - Melanie
    >
    > [1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%40mail.gmail.com
    
    
    0002 No comments from me. Looks straightforward.
    
    Few comments on 0003.
    
    1) This patch introduces XLHP_HAS_VMFLAGS. However it lacks some
    helpful comments about this new status bit.
    
    a) In heapam_xlog.h, in xl_heap_prune struct definition:
    
    
    /*
    * If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
    * unaligned
    */
    + /* If XLHP_HAS_VMFLAGS is set, newly set visibility map bits comes,
    unaligned */
    
    b)
    
    we can add here comment why we use  memcpy assignment, akin to /*
    memcpy() because snapshot_conflict_horizon is stored unaligned */
    
    + /* Next are the optionally included vmflags. Copy them out for later use. */
    + if ((xlrec.flags & XLHP_HAS_VMFLAGS) != 0)
    + {
    + memcpy(&vmflags, maindataptr, sizeof(uint8));
    + maindataptr += sizeof(uint8);
    
    2) Should we move conflict_xid = visibility_cutoff_xid; assignment
    just after heap_page_is_all_visible_except_lpdead call in
    lazy_vacuum_heap_page?
    
    3) Looking at this diff, do not comprehend one bit: how are we
    protected from passing an all-visible page to lazy_vacuum_heap_page. I
    did not manage to reproduce such behaviour though.
    
    + if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
    + {
    + Assert(!PageIsAllVisible(page));
    + set_pd_all_vis = true;
    + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    + PageSetAllVisible(page);
    + visibilitymap_set_vmbyte(vacrel->rel,
    + blkno,
    +
    
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  12. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-08-27T12:55:27Z

    On Sat, 2 Aug 2025 at 02:36, Melanie Plageman <melanieplageman@gmail.com> wrote:
    >
    > On Thu, Jul 31, 2025 at 6:58 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > >
    > > The patch "Set-pd_prune_xid-on-insert.txt" can be applied as the last
    > > patch in the set. It sets pd_prune_xid on insert (so pages filled by
    > > COPY or insert can also be set all-visible in the VM before they are
    > > vacuumed). I gave it a .txt extension because it currently fails
    > > 035_standby_logical_decoding due to a recovery conflict. I need to
    > > investigate more to see if this is a bug in my patch set or elsewhere
    > > in Postgres.
    >
    > I figured out that if we set the VM on-access, we need to enable
    > hot_standby_feedback in more places in 035_standby_logical_decoding.pl
    > to avoid recovery conflicts. I've done that in the attached updated
    > version 6. There are a few other issues in
    > 035_standby_logical_decoding.pl that I reported here [1]. With these
    > changes, setting pd_prune_xid on insert passes tests. Whether or not
    > we want to do it (and what the heuristic should be for deciding when
    > to do it) is another question.
    >
    > - Melanie
    >
    > [1] https://www.postgresql.org/message-id/flat/CAAKRu_YO2mEm%3DZWZKPjTMU%3DgW5Y83_KMi_1cr51JwavH0ctd7w%40mail.gmail.com
    
    v6-0015:
    I chose to verify whether this single modification would be beneficial
    on the HEAD.
    
    Benchmark I did:
    
    ```
    
    \timing
    CREATE TABLE zz(i int);
    alter table zz set (autovacuum_enabled = false);
    TRUNCATE zz;
    copy zz from program 'yes 2 | head -n 180000000';
    copy zz from program 'yes 2 | head -n 180000000';
    
    delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')',
    '}')::int[])[2] = 7 ;
    
    VACUUM FREEZE zz;
    ```
    
    And I checked perf top footprint for last statement (vacuum).  My
    detailed results are attached. It is a HEAD vs HEAD+v6-0015 benchmark.
    
    TLDR: function inlining is indeed beneficial, TransactionIdPrecedes
    function disappears from perf top footprint, though query runtime is
    not changed much. So, while not resulting in query speedup, this can
    save CPU.
    Maybe we can derive an artificial benchmark, which will show query
    speed up, but for now I dont have one.
    
    -- 
    Best regards,
    Kirill Reshke
    
  13. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-08-27T13:08:28Z

    Thanks for all the reviews. I'm working on responding to your previous
    mails with a new version.
    
    On Wed, Aug 27, 2025 at 8:55 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > v6-0015:
    > I chose to verify whether this single modification would be beneficial
    > on the HEAD.
    >
    > Benchmark I did:
    >
    > ```
    >
    > \timing
    > CREATE TABLE zz(i int);
    > alter table zz set (autovacuum_enabled = false);
    > TRUNCATE zz;
    > copy zz from program 'yes 2 | head -n 180000000';
    > copy zz from program 'yes 2 | head -n 180000000';
    >
    > delete from zz where (REPLACE(REPLACE(ctid::text, '(', '{'), ')',
    > '}')::int[])[2] = 7 ;
    >
    > VACUUM FREEZE zz;
    > ```
    >
    > And I checked perf top footprint for last statement (vacuum).  My
    > detailed results are attached. It is a HEAD vs HEAD+v6-0015 benchmark.
    >
    > TLDR: function inlining is indeed beneficial, TransactionIdPrecedes
    > function disappears from perf top footprint, though query runtime is
    > not changed much. So, while not resulting in query speedup, this can
    > save CPU.
    > Maybe we can derive an artificial benchmark, which will show query
    > speed up, but for now I dont have one.
    
    I'm not surprised that vacuum freeze does not show a speed up from the
    function inlining.
    
    This patch was key for avoiding a regression in the most contrived
    worst case scenario example of setting the VM on-access. That is, if
    you are pruning only a single tuple on the page as part of a SELECT
    query that returns no tuples (think SELECT * FROM foo OFFSET N where N
    is greater than the number of rows in the table), and I add
    determining if the page is all visible, then the overhead of these
    extra function calls in heap_prune_record_unchanged_lp_normal() is
    noticeable.
    
    We might be able to come up with a similar example in vacuum without
    freeze since it will try to determine if the page is all-visible. Your
    example is still running on my machine, though, so I haven't verified
    this yet :)
    
    - Melanie
    
    
    
    
  14. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-08-27T19:02:01Z

    Thanks for the review! Updates are in attached v7.
    
    One note on 0022 in the set, which sets pd_prune_xid on insert: the
    recently added index-killtuples isolation test was failing with this
    patch applied. With the patch, the "access" step reports more heap
    page hits than before. After some analysis, it seems one of the heap
    pages in kill_prior_tuples table is now being pruned in an earlier
    step. Somehow this leads to 4 hits counted instead of 3 (even though
    there are only 4 blocks in the relation). I recall Bertrand mentioning
    something in some other thread about hits being double counted with
    AIO reads, so I'm going to try and go dig that up. But, for now, I've
    modified the test -- I believe the patch is only revealing an issue
    with instrumentation, not causing a bug.
    
    On Tue, Aug 26, 2025 at 5:58 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > visibilitymap_set_vmbyte is introduced in 0001 and removed in 0012.
    > This is strange to me, maybe we can avoid visibilitymap_set_vmbyte in
    > first place?
    
    The reason for this is that in the earlier patch I introduce
    visibilitymap_set_vmbyte() for one user while other users still use
    visibilitymap_set(). But, by the end of the set, all users use
    visibillitymap_set_vmbyte(). So I think it makes most sense for it to
    be named visibilitymap_set() at that point. Until all users are
    committed, the two functions both have to exist and need different
    names.
    
    > In 0001:
    > should we add "Assert(LWLockHeldByMeInMode(BufferDescriptorGetContentLock(bufHdr),
    > LW_EXCLUSIVE));" in visibilitymap_set_vmbyte?
    
    I don't want any operations on the heap buffer (including asserts) in
    visibilitymap_set_vmbyte(). The heap block is only provided to look up
    the VM bits.
    
    I think your idea is a good one for the existing visibilitymap_set(),
    though, so I've added a new patch to the set (0002) which does this. I
    also added a similar assertion for the vmbuffer to
    visibilitymap_set_vmbyte().
    
    > Also here  `Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer),
    > vmbuffer));` can be beneficial:
    
    I had omitted this because the same logic is checked inside of
    visiblitymap_set_vmbyte() with an error occurring if conditions are
    not met. However, since the same is true in visibilitymap_set() and
    heap_multi_insert() still asserted visiblitymap_pin_ok(), I've added
    it back to my patch set.
    
    > in heap_xlog_multi_insert:
    > +
    > + visibilitymap_pin(reln, blkno, &vmbuffer);
    > + visibilitymap_set_vmbyte(....)
    >
    > Do we need to pin vmbuffer here? Looks like
    > XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
    > with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
    > and COPY ... WITH (FREEZE true) test.
    
    I thought the reason visibilitymap_set() did it was that it was
    possible for the block of the VM corresponding to the block of the
    heap to be different during recovery than it was when emitting the
    record, and thus we needed the part of visiblitymap_pin() that
    released the old vmbuffer and got the new one corresponding to the
    heap block.
    
    I can't quite think of how this could happen though.
    
    Assuming it can't happen, then we can get rid of visiblitymap_pin()
    (and add visibilitymap_pin_ok()) in both visiblitymap_set_vmbyte() and
    visibilitymap_set(). I've done this to visibilitymap_set() in a
    separate patch 0001. I would like other opinions/confirmation that the
    block of the VM corresponding to the heap block cannot differ during
    recovery from that what it was when the record was emitted during
    normal operation, though.
    
    > > +#ifdef TRACE_VISIBILITYMAP
    > > + elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
    > > +#endif
    >
    > I can see this merely copy-pasted from visibilitymap_set, but maybe
    > display "flags" also?
    
    Done in attached.
    
    > 4) visibilitymap_set receives  XLogRecPtr recptr parameters, which is
    > set to WAL record lsn during recovery and to InvalidXLogRecPtr
    > otherwise. visibilitymap_set manages VM page LSN bases on this recptr
    > value (inside function logic). visibilitymap_set_vmbyte behaves
    > vise-versa and makes its caller responsible for page LSN management.
    > Maybe we should keep these two functions akin to each other?
    
    So, with visibilitymap_set_vmbyte(), the whole idea is to just update
    the VM and then leave the WAL logging and other changes to the caller
    (like marking the buffer dirty, setting the page LSN, etc). The series
    of operations needed to make a persistent change are up to the caller.
    visibilitymap_set() is meant to just make sure that the correct bits
    in the VM are set for the given heap block.
    
    I looked at ways of making the current visibilitymap_set() API cleaner
    -- like setting the heap page LSN with the VM recptr in the caller of
    visibilitymap_set() instead. There wasn't a way of doing it that
    seemed like enough of an improvement to merit the change.
    
    Not to mention, the goal of the patchset is to remove the current
    visibilitymap_set(), so I'm not too worried about parity between the
    two functions. They may coexist for awhile, but hopefully today's
    visibilitymap_set() will eventually be removed.
    
    - Melanie
    
  15. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-08-27T19:08:41Z

    On Tue, Aug 26, 2025 at 4:01 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > Few comments on 0003.
    >
    > 1) This patch introduces XLHP_HAS_VMFLAGS. However it lacks some
    > helpful comments about this new status bit.
    
    I added the ones you suggested in my v7 posted here [1].
    
    > 2) Should we move conflict_xid = visibility_cutoff_xid; assignment
    > just after heap_page_is_all_visible_except_lpdead call in
    > lazy_vacuum_heap_page?
    
    Why would we want to do that? We only want to set it if the page is
    all visible, so we would have to guard it similarly.
    
    > 3) Looking at this diff, do not comprehend one bit: how are we
    > protected from passing an all-visible page to lazy_vacuum_heap_page. I
    > did not manage to reproduce such behaviour though.
    >
    > + if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
    > + {
    > + Assert(!PageIsAllVisible(page));
    > + set_pd_all_vis = true;
    > + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    > + PageSetAllVisible(page);
    > + visibilitymap_set_vmbyte(vacrel->rel,
    > + blkno,
    
    So, for one, there is an assert just above this code in
    lazy_vacuum_heap_page() that nunused > 0 -- so we know that the page
    couldn't have been all-visible already because it had unused line
    pointers.
    
    Otherwise, if it was possible for an already all-visible page to get
    here, the same thing would happen that happens on master --
    heap_page_is_all_visible[_except_lpdead()] would return true and we
    would try to set the VM which would end up being a no-op.
    
    - Melanie
    
    [1] https://www.postgresql.org/message-id/CAAKRu_YD0ecXeAh%2BDmJpzQOJwcRzmMyGdcc5W_0pEF78rYSJkQ%40mail.gmail.com
    
    
    
    
  16. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-08-28T09:11:48Z

    On Thu, 28 Aug 2025 at 00:02, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    
    > > Do we need to pin vmbuffer here? Looks like
    > > XLogReadBufferForRedoExtended already pins vmbuffer. I verified this
    > > with CheckBufferIsPinnedOnce(vmbuffer) just before visibilitymap_pin
    > > and COPY ... WITH (FREEZE true) test.
    >
    > I thought the reason visibilitymap_set() did it was that it was
    > possible for the block of the VM corresponding to the block of the
    > heap to be different during recovery than it was when emitting the
    > record, and thus we needed the part of visiblitymap_pin() that
    > released the old vmbuffer and got the new one corresponding to the
    > heap block.
    >
    > I can't quite think of how this could happen though.
    >
    > Assuming it can't happen, then we can get rid of visiblitymap_pin()
    > (and add visibilitymap_pin_ok()) in both visiblitymap_set_vmbyte() and
    > visibilitymap_set(). I've done this to visibilitymap_set() in a
    > separate patch 0001. I would like other opinions/confirmation that the
    > block of the VM corresponding to the heap block cannot differ during
    > recovery from that what it was when the record was emitted during
    > normal operation, though.
    
    I did micro git-blame research here. I spotted only one related change
    [0]. Looks like before this change pin was indeed needed.
    But not after this change, so this visibilitymap_pin is just an oversight?
    Related thread is [1]. I quickly checked the discussion in this
    thread, and it looks like no one was bothered about these lines or VM
    logging changes (in this exact pin buffer aspect). The discussion was
    of other aspects of this commit.
    
    [0] https://github.com/postgres/postgres/commit/2c03216d8311
    [1] https://www.postgresql.org/message-id/533D6CBF.6080203%40vmware.com
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  17. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-02T21:52:37Z

    On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > I did micro git-blame research here. I spotted only one related change
    > [0]. Looks like before this change pin was indeed needed.
    > But not after this change, so this visibilitymap_pin is just an oversight?
    > Related thread is [1]. I quickly checked the discussion in this
    > thread, and it looks like no one was bothered about these lines or VM
    > logging changes (in this exact pin buffer aspect). The discussion was
    > of other aspects of this commit.
    
    Wow, thanks so much for doing that research. Looking at it myself, it
    does indeed seem like just an oversight. It isn't harmful since it
    won't take another pin, but it is confusing, so I think we should at
    least remove it in master. I'm not as sure about back branches.
    
    I would like someone to confirm that there is no way we could end up
    with a different block of the VM containing the vm bits for a heap
    block during recovery than during normal operation.
    
    - Melanie
    
    
    
    
  18. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-02T23:11:01Z

    On Tue, Sep 2, 2025 at 5:52 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    > >
    > > I did micro git-blame research here. I spotted only one related change
    > > [0]. Looks like before this change pin was indeed needed.
    > > But not after this change, so this visibilitymap_pin is just an oversight?
    > > Related thread is [1]. I quickly checked the discussion in this
    > > thread, and it looks like no one was bothered about these lines or VM
    > > logging changes (in this exact pin buffer aspect). The discussion was
    > > of other aspects of this commit.
    >
    > Wow, thanks so much for doing that research. Looking at it myself, it
    > does indeed seem like just an oversight. It isn't harmful since it
    > won't take another pin, but it is confusing, so I think we should at
    > least remove it in master. I'm not as sure about back branches.
    
    I've updated the commit message in the patch set to reflect the
    research you did in attached v8.
    
    - Melanie
    
  19. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Andres Freund <andres@anarazel.de> — 2025-09-02T23:54:07Z

    Hi,
    
    On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
    > From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Wed, 27 Aug 2025 08:50:15 -0400
    > Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay
    
    LGTM.
    
    
    > From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Wed, 27 Aug 2025 10:07:29 -0400
    > Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set
    
    LGTM.
    
    
    > From 07f31099754636ec9dabf6cca06c33c4b19c230c Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Tue, 17 Jun 2025 17:22:10 -0400
    > Subject: [PATCH v8 03/22] Eliminate xl_heap_visible in COPY FREEZE
    >
    > Instead of emitting a separate WAL record for setting the VM bits in
    > xl_heap_visible, specify the changes to make to the VM block in the
    > xl_heap_multi_insert record instead.
    >
    > Author: Melanie Plageman <melanieplageman@gmail.com>
    > Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
    > Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
    
    
    > +		/*
    > +		 * If we're only adding already frozen rows to a previously empty
    > +		 * page, mark it as all-frozen and update the visibility map. We're
    > +		 * already holding a pin on the vmbuffer.
    > +		 */
    >  		else if (all_frozen_set)
    > +		{
    > +			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
    >  			PageSetAllVisible(page);
    > +			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    > +			visibilitymap_set_vmbyte(relation,
    > +									 BufferGetBlockNumber(buffer),
    > +									 vmbuffer,
    > +									 VISIBILITYMAP_ALL_VISIBLE |
    > +									 VISIBILITYMAP_ALL_FROZEN);
    > +		}
    
    From an abstraction POV I don't love that heapam now is responsible for
    acquiring and releasing the lock. But that ship already kind of has sailed, as
    heapam.c is already responsible for releasing the vm buffer etc...
    
    I've wondered about splitting the responsibilities up into multiple
    visibilitymap_set_* functions, so that heapam.c wouldn't need to acquire the
    lock and set the LSN. But it's probably not worth it.
    
    
    > +	/*
    > +	 * Now read and update the VM block. Even if we skipped updating the heap
    > +	 * page due to the file being dropped or truncated later in recovery, it's
    > +	 * still safe to update the visibility map.  Any WAL record that clears
    > +	 * the visibility map bit does so before checking the page LSN, so any
    > +	 * bits that need to be cleared will still be cleared.
    > +	 *
    > +	 * It is only okay to set the VM bits without holding the heap page lock
    > +	 * because we can expect no other writers of this page.
    > +	 */
    > +	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
    > +		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
    > +									  &vmbuffer) == BLK_NEEDS_REDO)
    > +	{
    > +		Relation	reln = CreateFakeRelcacheEntry(rlocator);
    > +
    > +		Assert(visibilitymap_pin_ok(blkno, vmbuffer));
    > +		visibilitymap_set_vmbyte(reln, blkno,
    > +								 vmbuffer,
    > +								 VISIBILITYMAP_ALL_VISIBLE |
    > +								 VISIBILITYMAP_ALL_FROZEN);
    > +
    > +		/*
    > +		 * It is not possible that the VM was already set for this heap page,
    > +		 * so the vmbuffer must have been modified and marked dirty.
    > +		 */
    > +		Assert(BufferIsDirty(vmbuffer));
    
    How about making visibilitymap_set_vmbyte() return whether it needed to do
    something? This seems somewhat indirect...
    
    I think it might be good to encapsulate this code into a helper in
    visibilitymap.c, there will be more callers in the subsequent patches.
    
    
    > +/*
    > + * Set flags in the VM block contained in the passed in vmBuf.
    > + *
    > + * This function is for callers which include the VM changes in the same WAL
    > + * record as the modifications of the heap page which rendered it all-visible.
    > + * Callers separately logging the VM changes should invoke visibilitymap_set()
    > + * instead.
    > + *
    > + * Caller must have pinned and exclusive locked the correct block of the VM in
    > + * vmBuf. This block should contain the VM bits for the given heapBlk.
    > + *
    > + * During normal operation (i.e. not recovery), this should be called in a
    > + * critical section which also makes any necessary changes to the heap page
    > + * and, if relevant, emits WAL.
    > + *
    > + * Caller is responsible for WAL logging the changes to the VM buffer and for
    > + * making any changes needed to the associated heap page. This includes
    > + * maintaining any invariants such as ensuring the buffer containing heapBlk
    > + * is pinned and exclusive locked.
    > + */
    > +uint8
    > +visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
    > +						 Buffer vmBuf, uint8 flags)
    
    Why is it named vmbyte? This actually just sets the two bits corresponding to
    the buffer, not the entire byte. So it seems somewhat misleading to reference
    byte.
    
    
    
    
    > From dc318358572f61efbd0e05aae2b9a077b422bcf5 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Wed, 18 Jun 2025 12:42:13 -0400
    > Subject: [PATCH v8 05/22] Eliminate xl_heap_visible from vacuum phase III
    >
    > Instead of emitting a separate xl_heap_visible record for each page that
    > is rendered all-visible by vacuum's third phase, include the updates to
    > the VM in the already emitted xl_heap_prune record.
    
    Reading through the change I didn't particularly like that there's another
    optional field in xl_heap_prune, as it seemed liked something that should be
    encoded in flags.  Of course there aren't enough flag bits available.  But
    that made me look at the rest of the record: Uh, what do we use the reason
    field for?  As far as I can tell f83d709760d8 added it without introducing any
    users? It doesn't even seem to be set.
    
    
    > @@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    >  		   (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
    >
    >  	/*
    > -	 * We are about to remove and/or freeze tuples.  In Hot Standby mode,
    > -	 * ensure that there are no queries running for which the removed tuples
    > -	 * are still visible or which still consider the frozen xids as running.
    > -	 * The conflict horizon XID comes after xl_heap_prune.
    > +	 * After xl_heap_prune is the optional snapshot conflict horizon.
    > +	 *
    > +	 * In Hot Standby mode, we must ensure that there are no running queries
    > +	 * which would conflict with the changes in this record. If pruning, that
    > +	 * means we cannot remove tuples still visible to transactions on the
    > +	 * standby. If freezing, that means we cannot freeze tuples with xids that
    > +	 * are still considered running on the standby. And for setting the VM, we
    > +	 * cannot do so if the page isn't all-visible to all transactions on the
    > +	 * standby.
    >  	 */
    
    I'm a bit confused by this new comment - it sounds like we're deciding whether
    to remove tuple versions, but that decision has long been made, no?
    
    
    
    > @@ -2846,8 +2848,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
    >  	OffsetNumber unused[MaxHeapTuplesPerPage];
    >  	int			nunused = 0;
    >  	TransactionId visibility_cutoff_xid;
    > +	TransactionId conflict_xid = InvalidTransactionId;
    >  	bool		all_frozen;
    >  	LVSavedErrInfo saved_err_info;
    > +	uint8		vmflags = 0;
    > +	bool		set_pd_all_vis = false;
    >
    >  	Assert(vacrel->do_index_vacuuming);
    >
    > @@ -2858,6 +2863,20 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
    >  							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
    >  							 InvalidOffsetNumber);
    >
    > +	if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
    > +											   vacrel->cutoffs.OldestXmin,
    > +											   deadoffsets, num_offsets,
    > +											   &all_frozen, &visibility_cutoff_xid,
    > +											   &vacrel->offnum))
    > +	{
    > +		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
    > +		if (all_frozen)
    > +		{
    > +			vmflags |= VISIBILITYMAP_ALL_FROZEN;
    > +			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
    > +		}
    > +	}
    > +
    >  	START_CRIT_SECTION();
    
    I am rather confused - we never can set all-visible if there are any LP_DEAD
    items left. If the idea is that we are removing the LP_DEAD items in
    lazy_vacuum_heap_page() - what guarantees that all LP_DEAD items are being
    removed? Couldn't some tuples get marked LP_DEAD by on-access pruning, after
    vacuum visited the page and collected dead items?
    
    Ugh, I see - it works because we pass in the set of dead items.  I think that
    makes the name *really* misleading, it's not except LP_DEAD, it's except the
    offsets passed in, no?
    
    But then you actually check that the set of dead items didn't change - what
    guarantees that?
    
    
    I didn't look at the later patches, except that I did notice this:
    
    > @@ -268,7 +264,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    >  		Relation	reln = CreateFakeRelcacheEntry(rlocator);
    >
    >  		visibilitymap_pin(reln, blkno, &vmbuffer);
    > -		old_vmbits = visibilitymap_set_vmbyte(reln, blkno, vmbuffer, vmflags);
    > +		old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
    >  		/* Only set VM page LSN if we modified the page */
    >  		if (old_vmbits != vmflags)
    >  			PageSetLSN(BufferGetPage(vmbuffer), lsn);
    > @@ -279,143 +275,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    >  		UnlockReleaseBuffer(vmbuffer);
    >  }
    
    Why are we manually pinning the vm buffer here? Shouldn't the xlog machinery
    have done so, as you noticed in one of the early on patches?
    
    Greetings,
    
    Andres Freund
    
    
    
    
  20. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-09-03T09:06:40Z

    On Wed, 3 Sept 2025 at 04:11, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > On Tue, Sep 2, 2025 at 5:52 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > >
    > > On Thu, Aug 28, 2025 at 5:12 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    > > >
    > > > I did micro git-blame research here. I spotted only one related change
    > > > [0]. Looks like before this change pin was indeed needed.
    > > > But not after this change, so this visibilitymap_pin is just an oversight?
    > > > Related thread is [1]. I quickly checked the discussion in this
    > > > thread, and it looks like no one was bothered about these lines or VM
    > > > logging changes (in this exact pin buffer aspect). The discussion was
    > > > of other aspects of this commit.
    > >
    > > Wow, thanks so much for doing that research. Looking at it myself, it
    > > does indeed seem like just an oversight. It isn't harmful since it
    > > won't take another pin, but it is confusing, so I think we should at
    > > least remove it in master. I'm not as sure about back branches.
    >
    > I've updated the commit message in the patch set to reflect the
    > research you did in attached v8.
    >
    > - Melanie
    
    
    
    Hi!
    
    small comments regarding new series
    
    0001, 0002, 0017 LGTM
    
    
    In 0015:
    
    ```
    reshke@yezzey-cbdb-bench:~/postgres$ git diff
    src/backend/access/heap/pruneheap.c
    diff --git a/src/backend/access/heap/pruneheap.c
    b/src/backend/access/heap/pruneheap.c
    index 05b51bd8d25..0794af9ae89 100644
    --- a/src/backend/access/heap/pruneheap.c
    +++ b/src/backend/access/heap/pruneheap.c
    @@ -1398,7 +1398,7 @@ heap_prune_record_unchanged_lp_normal(Page page,
    PruneState *prstate, OffsetNumb
                                    /*
                                     * For now always use prstate->cutoffs
    for this test, because
                                     * we only update 'all_visible' when
    freezing is requested. We
    -                                * could use
    GlobalVisTestIsRemovableXid instead, if a
    +                                * could use GlobalVisXidVisibleToAll
    instead, if a
                                     * non-freezing caller wanted to set the VM bit.
                                     */
                                    Assert(prstate->cutoffs);
    ```
    
    Also, maybe GlobalVisXidTestAllVisible is a slightly better name? (The
    term 'all-visible' is one that we occasionally utilize)
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  21. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-05T22:20:21Z

    Thanks for the review!
    
    On Tue, Sep 2, 2025 at 7:54 PM Andres Freund <andres@anarazel.de> wrote:
    >
    > On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
    > > From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
    > > From: Melanie Plageman <melanieplageman@gmail.com>
    > > Date: Wed, 27 Aug 2025 08:50:15 -0400
    > > Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay
    
    I didn't push it yet because I did a new version that actually
    eliminates the asserts in heap_multi_insert() before calling
    visibilitymap_set() -- since they are redundant with checks inside
    visibilitymap_set(). 0001 of attached v9 is what I plan to push,
    barring any objections.
    
    > > From 7c5cb3edf89735eaa8bee9ca46111bd6c554720b Mon Sep 17 00:00:00 2001
    > > From: Melanie Plageman <melanieplageman@gmail.com>
    > > Date: Wed, 27 Aug 2025 10:07:29 -0400
    > > Subject: [PATCH v8 02/22] Add assert and log message to visibilitymap_set
    
    I pushed this.
    
    > From an abstraction POV I don't love that heapam now is responsible for
    > acquiring and releasing the lock. But that ship already kind of has sailed, as
    > heapam.c is already responsible for releasing the vm buffer etc...
    >
    > I've wondered about splitting the responsibilities up into multiple
    > visibilitymap_set_* functions, so that heapam.c wouldn't need to acquire the
    > lock and set the LSN. But it's probably not worth it.
    
    Yea, I explored heap wrappers coupling heap operations related to
    setting the VM along with the VM updates [1], but the results weren't
    appealing. Setting the heap LSN and marking the heap buffer dirty and
    such happens in a different place in different callers because it is
    happening as part of the operations that actually end up rendering the
    page all-visible.
    
    And a VM-only helper would literally just acquire and release the lock
    and set the LSN on the vm page -- which I don't think is worth it.
    
    > > +     /*
    > > +      * Now read and update the VM block. Even if we skipped updating the heap
    > > +      * page due to the file being dropped or truncated later in recovery, it's
    > > +      * still safe to update the visibility map.  Any WAL record that clears
    > > +      * the visibility map bit does so before checking the page LSN, so any
    > > +      * bits that need to be cleared will still be cleared.
    > > +      *
    > > +      * It is only okay to set the VM bits without holding the heap page lock
    > > +      * because we can expect no other writers of this page.
    > > +      */
    > > +     if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
    > > +             XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
    > > +                                                                       &vmbuffer) == BLK_NEEDS_REDO)
    > > +     {
    > > +             Relation        reln = CreateFakeRelcacheEntry(rlocator);
    > > +
    > > +             Assert(visibilitymap_pin_ok(blkno, vmbuffer));
    > > +             visibilitymap_set_vmbyte(reln, blkno,
    > > +                                                              vmbuffer,
    > > +                                                              VISIBILITYMAP_ALL_VISIBLE |
    > > +                                                              VISIBILITYMAP_ALL_FROZEN);
    > > +
    > > +             /*
    > > +              * It is not possible that the VM was already set for this heap page,
    > > +              * so the vmbuffer must have been modified and marked dirty.
    > > +              */
    > > +             Assert(BufferIsDirty(vmbuffer));
    >
    > How about making visibilitymap_set_vmbyte() return whether it needed to do
    > something? This seems somewhat indirect...
    
    It does return the state of the previous bits. But, I am specifically
    asserting that the buffer is dirty because I am about to set the page
    LSN. So I don't just care that changes were made, I care that we
    remembered to mark the buffer dirty.
    
    > I think it might be good to encapsulate this code into a helper in
    > visibilitymap.c, there will be more callers in the subsequent patches.
    
    By the end of the set, the different callers have different
    expectations (some don't expect the buffer to have been dirtied
    necessarily) and where they do the various related operations is
    spread out depending on the caller. I just couldn't come up with a
    helper solution I liked.
    
    That being said, I definitely don't think it's needed for this patch
    (logging setting the VM in xl_heap_multi_insert()).
    
    > > +uint8
    > > +visibilitymap_set_vmbyte(Relation rel, BlockNumber heapBlk,
    > > +                                              Buffer vmBuf, uint8 flags)
    >
    > Why is it named vmbyte? This actually just sets the two bits corresponding to
    > the buffer, not the entire byte. So it seems somewhat misleading to reference
    > byte.
    
    Renamed it to visibilitymap_set_vmbits.
    
    > > Instead of emitting a separate xl_heap_visible record for each page that
    > > is rendered all-visible by vacuum's third phase, include the updates to
    > > the VM in the already emitted xl_heap_prune record.
    >
    > Reading through the change I didn't particularly like that there's another
    > optional field in xl_heap_prune, as it seemed liked something that should be
    > encoded in flags.  Of course there aren't enough flag bits available.  But
    > that made me look at the rest of the record: Uh, what do we use the reason
    > field for?  As far as I can tell f83d709760d8 added it without introducing any
    > users? It doesn't even seem to be set.
    
    yikes, you are right about the "reason" member. Attached 0002 removes
    it, and I'll go ahead and fix it in the back branches too. I can't
    fathom how that slipped through the cracks. We do pass the PruneReason
    for setting the rmgr info about what type of record it is (i.e. if it
    is one emitted by vacuum phase I, phase III, or on-access pruning).
    But we don't need or use a separate member.. I went back and tried to
    figure out what the rationale was, but I couldn't find anything.
    
    As for the VM flags being an optional unaligned member -- in v9, I've
    expanded the flags member to a uint16 to make room for the extra
    flags. Seems we've been surviving with using up 2 bytes this long.
    
    > > @@ -51,10 +52,15 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    > >                  (xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
    > >
    > >       /*
    > > -      * We are about to remove and/or freeze tuples.  In Hot Standby mode,
    > > -      * ensure that there are no queries running for which the removed tuples
    > > -      * are still visible or which still consider the frozen xids as running.
    > > -      * The conflict horizon XID comes after xl_heap_prune.
    > > +      * After xl_heap_prune is the optional snapshot conflict horizon.
    > > +      *
    > > +      * In Hot Standby mode, we must ensure that there are no running queries
    > > +      * which would conflict with the changes in this record. If pruning, that
    > > +      * means we cannot remove tuples still visible to transactions on the
    > > +      * standby. If freezing, that means we cannot freeze tuples with xids that
    > > +      * are still considered running on the standby. And for setting the VM, we
    > > +      * cannot do so if the page isn't all-visible to all transactions on the
    > > +      * standby.
    > >        */
    >
    > I'm a bit confused by this new comment - it sounds like we're deciding whether
    > to remove tuple versions, but that decision has long been made, no?
    
    Well, the comment is a revision of a comment that was already there on
    essentially why replaying this record could cause recovery conflicts.
    It mentioned pruning and freezing, so I expanded it to mention setting
    the VM. Taking into account your confusion, I tried rewording it in
    attached v9.
    
    > > +     if (heap_page_is_all_visible_except_lpdead(vacrel->rel, buffer,
    > > +                                                                                        vacrel->cutoffs.OldestXmin,
    > > +                                                                                        deadoffsets, num_offsets,
    > > +                                                                                        &all_frozen, &visibility_cutoff_xid,
    > > +                                                                                        &vacrel->offnum))
    >
    > I am rather confused - we never can set all-visible if there are any LP_DEAD
    > items left. If the idea is that we are removing the LP_DEAD items in
    > lazy_vacuum_heap_page() - what guarantees that all LP_DEAD items are being
    > removed? Couldn't some tuples get marked LP_DEAD by on-access pruning, after
    > vacuum visited the page and collected dead items?
    >
    > Ugh, I see - it works because we pass in the set of dead items.  I think that
    > makes the name *really* misleading, it's not except LP_DEAD, it's except the
    > offsets passed in, no?
    >
    > But then you actually check that the set of dead items didn't change - what
    > guarantees that?
    
    So, I pass in the deadoffsets we got from the TIDStore. If the only
    dead items on the page are exactly those dead items, then the page
    will be all-visible as soon as we set those LP_UNUSED -- which we do
    unconditionally. And we have the lock on the page, so no one can
    on-access prune and make new dead items while we are in
    lazy_vacuum_heap_page().
    
    Given your confusion, I've refactored this and used a different
    approach -- I explicitly check the passed-in deadoffsets array when I
    encounter a dead item and see if it is there. That should hopefully
    make it more clear.
    
    > I didn't look at the later patches, except that I did notice this:
    <--snip-->
    > Why are we manually pinning the vm buffer here? Shouldn't the xlog machinery
    > have done so, as you noticed in one of the early on patches?
    
    Fixed. Thanks!
    
    - Melanie
    
    [1] [1] https://www.postgresql.org/message-id/flat/CAAKRu_Yj%3DyrL%2BgGGsqfYVQcYn7rDp6hDeoF1vN453JDp8dEY%2Bw%40mail.gmail.com#94602c599abdc8dfc5e438bd24bd8d50
    
  22. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-05T22:27:05Z

    On Wed, Sep 3, 2025 at 5:06 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > small comments regarding new series
    >
    > 0001, 0002, 0017 LGTM
    
    Thanks for continuing to review!
    
    > In 0015:
    >
    > Also, maybe GlobalVisXidTestAllVisible is a slightly better name? (The
    > term 'all-visible' is one that we occasionally utilize)
    
    Actually, I was trying to distinguish it from all-visible because I
    interpret that to mean every thing is visible -- as in, every tuple on
    a page is visible to everyone. And here we are referring to one xid
    and want to know if it is visible to everyone as no longer running. I
    don't think my name  ("visible-to-all") is good, but I'm hesitant to
    co-opt "all-visible" here.
    
    - Melanie
    
    
    
    
  23. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-08T15:44:24Z

    On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > > On 2025-09-02 19:11:01 -0400, Melanie Plageman wrote:
    > > > From dd98177294011ee93cac122405516abd89f4e393 Mon Sep 17 00:00:00 2001
    > > > From: Melanie Plageman <melanieplageman@gmail.com>
    > > > Date: Wed, 27 Aug 2025 08:50:15 -0400
    > > > Subject: [PATCH v8 01/22] Remove unneeded VM pin from VM replay
    >
    > I didn't push it yet because I did a new version that actually
    > eliminates the asserts in heap_multi_insert() before calling
    > visibilitymap_set() -- since they are redundant with checks inside
    > visibilitymap_set(). 0001 of attached v9 is what I plan to push,
    > barring any objections.
    
    I pushed this, so rebased v10 is  attached. I've added one new patch:
    0002 adds ERRCODE_DATA_CORRUPTED to the existing log messages about
    VM/data corruption in vacuum. Andrey Borodin earlier suggested this,
    and I had neglected to include it.
    
    - Melanie
    
  24. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-08T16:41:00Z

    On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    > yikes, you are right about the "reason" member. Attached 0002 removes
    > it, and I'll go ahead and fix it in the back branches too.
    
    I think changing this in the back-branches is a super-bad idea. If you
    want, you can add a comment in the back-branches saying "oops, we
    shipped a field that isn't used for anything", but changing the struct
    definition is very likely to make 0 people happy and >0 people
    unhappy. On the other hand, changing this in master is a good idea and
    you should go ahead and do that before this creates any more
    confusion.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  25. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-08T18:32:29Z

    On Mon, Sep 8, 2025 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Fri, Sep 5, 2025 at 6:20 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > > yikes, you are right about the "reason" member. Attached 0002 removes
    > > it, and I'll go ahead and fix it in the back branches too.
    >
    > I think changing this in the back-branches is a super-bad idea. If you
    > want, you can add a comment in the back-branches saying "oops, we
    > shipped a field that isn't used for anything", but changing the struct
    > definition is very likely to make 0 people happy and >0 people
    > unhappy. On the other hand, changing this in master is a good idea and
    > you should go ahead and do that before this creates any more
    > confusion.
    
    Yes, that makes 100% sense. It should have occurred to me. I've pushed
    the commit to master. I didn't put an updated set of patches here in
    case someone was already reviewing them, as nothing else has changed.
    
    - Melanie
    
    
    
    
  26. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-08T18:54:34Z

    On Mon, Sep 8, 2025 at 11:44 AM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    > I pushed this, so rebased v10 is  attached. I've added one new patch:
    > 0002 adds ERRCODE_DATA_CORRUPTED to the existing log messages about
    > VM/data corruption in vacuum. Andrey Borodin earlier suggested this,
    > and I had neglected to include it.
    
    Writing "ereport(WARNING, (errcode(ERRCODE_DATA_CORRUPTED)" is very
    much a minority position. Generally the call to errcode() is on the
    following line. I think the commit message could use a bit of work,
    too. The first sentence heavily duplicates the second and the fourth,
    and the third sentence isn't sufficiently well-connected to the rest
    to make it clear why you're restating this general principle in this
    commit message.
    
    Perhaps something like:
    
    Add error codes when VACUUM discovers VM corruption
    
    Commit fd6ec93bf890314ac694dc8a7f3c45702ecc1bbd and other previous
    work has established the principle that when an error is potentially
    reachable in case of on-disk corruption, but is not expected to be
    reached otherwise, ERRCODE_DATA_CORRUPTED should be used. This allows
    log monitoring software to search for evidence of corruption by
    filtering on the error code.
    
    That kibitzing aside, I think this is pretty clearly the right thing to do.
    
    --
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  27. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-08T19:14:01Z

    On Mon, Sep 8, 2025 at 2:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > Commit fd6ec93bf890314ac694dc8a7f3c45702ecc1bbd and other previous
    > work has established the principle that when an error is potentially
    > reachable in case of on-disk corruption, but is not expected to be
    > reached otherwise, ERRCODE_DATA_CORRUPTED should be used. This allows
    > log monitoring software to search for evidence of corruption by
    > filtering on the error code.
    >
    > That kibitzing aside, I think this is pretty clearly the right thing to do.
    
    Thanks for the suggested wording and the pointer to that thread.
    
    I noticed that in that thread they decided to use errmsg_internal()
    instead of errmsg() for a few different reasons -- one of which was
    that the situation is not supposed to happen/cannot happen -- which I
    don't really understand. It is a reachable code path. Another is that
    it is extra work for translators, which I'm also not sure how to apply
    to my situation. Are the VM corruption cases worth extra work to the
    translators?
    
    I think the most compelling reason is that people will want to search
    for the error message in English online. So, for that reason, my
    instinct is to use errmsg_internal() in my case as well.
    
    - Melanie
    
    
    
    
  28. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-08T19:53:35Z

    On Mon, Sep 8, 2025 at 3:14 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    > I noticed that in that thread they decided to use errmsg_internal()
    > instead of errmsg() for a few different reasons -- one of which was
    > that the situation is not supposed to happen/cannot happen -- which I
    > don't really understand. It is a reachable code path. Another is that
    > it is extra work for translators, which I'm also not sure how to apply
    > to my situation. Are the VM corruption cases worth extra work to the
    > translators?
    >
    > I think the most compelling reason is that people will want to search
    > for the error message in English online. So, for that reason, my
    > instinct is to use errmsg_internal() in my case as well.
    
    I don't find that reason particularly compelling -- people could want
    to search for any error message, or they could equally want to be able
    to read it without Google translate. Guessing which messages are
    obscure enough that we need not translate them exceeds my powers. If I
    were doing it, I'd make it errmsg() rather than errmsg_internal() and
    let the translations team change it if they don't think it's worth
    bothering with, because if you make it errmsg_internal() then they
    won't see it until somebody complains about it not getting translated.
    However, I suspect different committers would pursue different
    strategies here.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  29. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-08T20:14:47Z

    Reviewing 0003:
    
    +               /*
    +                * If we're only adding already frozen rows to a
    previously empty
    +                * page, mark it as all-frozen and update the
    visibility map. We're
    +                * already holding a pin on the vmbuffer.
    +                */
                    else if (all_frozen_set)
    +               {
                            PageSetAllVisible(page);
    +                       LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    +                       visibilitymap_set_vmbits(relation,
    +
      BufferGetBlockNumber(buffer),
    +
      vmbuffer,
    +
      VISIBILITYMAP_ALL_VISIBLE |
    +
      VISIBILITYMAP_ALL_FROZEN);
    
    Locking a buffer in a critical section violates the order of
    operations proposed in the 'Write-Ahead Log Coding' section of
    src/backend/access/transam/README.
    
    +        * Now read and update the VM block. Even if we skipped
    updating the heap
    +        * page due to the file being dropped or truncated later in
    recovery, it's
    +        * still safe to update the visibility map.  Any WAL record that clears
    +        * the visibility map bit does so before checking the page LSN, so any
    +        * bits that need to be cleared will still be cleared.
    +        *
    +        * It is only okay to set the VM bits without holding the heap page lock
    +        * because we can expect no other writers of this page.
    
    The first paragraph of this paraphrases a similar content in
    xlog_heap_visible(), but I don't see the variation in phrasing as an
    improvement.
    
    The second paragraph does not convince me at all. I see no reason to
    believe that this is safe, or that it is a good idea. The code in
    xlog_heap_visible() thinks its OK to unlock and relock the page to
    make visibilitymap_set() happy, which is cringy but probably safe for
    lack of concurrent writers, but skipping locking altogether seems
    deeply unwise.
    
    - *             visibilitymap_set        - set a bit in a previously pinned page
    + *             visibilitymap_set        - set bit(s) in a previously
    pinned page and log
    + *      visibilitymap_set_vmbits - set bit(s) in a pinned page
    
    I suspect the indentation was done with a different mix of spaces and
    tabs here, because this doesn't align for me.
    
    In general, this idea makes some sense to me -- there doesn't seem to
    be any particularly good reason why the visibility-map update should
    be handled by a different WAL record than the all-visible flag on the
    page itself. It's a little hard for me to make that statement too
    conclusively without studying more of the patches than I've had time
    to do today, but off the top of my head it seems to make sense.
    However, I'm not sure you've taken enough care with the details here.
    
    --
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  30. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-08T22:28:46Z

    On Mon, Sep 8, 2025 at 4:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > Reviewing 0003:
    >
    > Locking a buffer in a critical section violates the order of
    > operations proposed in the 'Write-Ahead Log Coding' section of
    > src/backend/access/transam/README.
    
    Right, I noticed some other callers of visibiltymap_set() (like
    lazy_scan_new_or_empty()) did call it in a critical section (and it
    exclusive locks the VM page), so I thought perhaps it was better to
    keep this operation as close as possible to where we update the VM
    (similar to how it is in master in visibilitymap_set()).
    
    But, I think you're right that maintaining the order of operations
    proposed in transam/README is more important. As such, in attached
    v11, I've modified this patch and the other patches where I replace
    visibilitymap_set() with visibilitymap_set_vmbits() to exclusively
    lock the vmbuffer before the critical section.
    visibilitymap_set_vmbits() asserts that we have the vmbuffer
    exclusively locked, so we should be good.
    
    > +        * Now read and update the VM block. Even if we skipped
    > updating the heap
    > +        * page due to the file being dropped or truncated later in
    > recovery, it's
    > +        * still safe to update the visibility map.  Any WAL record that clears
    > +        * the visibility map bit does so before checking the page LSN, so any
    > +        * bits that need to be cleared will still be cleared.
    > +        *
    > +        * It is only okay to set the VM bits without holding the heap page lock
    > +        * because we can expect no other writers of this page.
    >
    > The first paragraph of this paraphrases a similar content in
    > xlog_heap_visible(), but I don't see the variation in phrasing as an
    > improvement.
    
    The only difference is I replaced the phrase "LSN interlock" with
    "being dropped or truncated later in recovery" -- which is more
    specific and, I thought, more clear. Without this comment, it took me
    some time to understand the scenarios that might lead us to skip
    updating the heap block. heap_xlog_visible() has cause to describe
    this situation in an earlier comment -- which is why I think the LSN
    interlock comment is less confusing there.
    
    Anyway, I'm open to changing the comment. I could:
    1) copy-paste the same comment as heap_xlog_visible()
    2) refer to the comment in heap_xlog_visible() (comment seemed a bit
    short for that)
    3) diverge the comments further by improving the new comment in
    heap_xlog_multi_insert() in some way
    4) something else?
    
    > The second paragraph does not convince me at all. I see no reason to
    > believe that this is safe, or that it is a good idea. The code in
    > xlog_heap_visible() thinks its OK to unlock and relock the page to
    > make visibilitymap_set() happy, which is cringy but probably safe for
    > lack of concurrent writers, but skipping locking altogether seems
    > deeply unwise.
    
    Actually in master, heap_xlog_visible() has no lock on the heap page
    when it calls visibiltymap_set(). It releases that lock before
    recording the freespace in the FSM and doesn't take it again.
    
    It does unlock and relock the VM page -- because visibilitymap_set()
    expects to take the lock on the VM.
    
    I agree that not holding the heap lock while updating the VM is
    unsatisfying. We can't hold it while doing the IO to read in the VM
    block in XLogReadBufferForRedoExtended(). So, we could take it again
    before calling visibilitymap_set(). But we don't always have the heap
    buffer, though. I suspect this is partially why heap_xlog_visible()
    unconditionally passes InvalidBuffer to visibilitymap_set() as the
    heap buffer and has special case handling for recovery when we don't
    have the heap buffer.
    
    In any case, it isn't an active bug, and I don't think future-proofing
    VM replay (i.e. against parallel recovery) is a prerequisite for
    committing this patch since it is also that way on master.
    
    > - *             visibilitymap_set        - set a bit in a previously pinned page
    > + *             visibilitymap_set        - set bit(s) in a previously
    > pinned page and log
    > + *      visibilitymap_set_vmbits - set bit(s) in a pinned page
    >
    > I suspect the indentation was done with a different mix of spaces and
    > tabs here, because this doesn't align for me.
    
    oops, fixed.
    
    I pushed the ERRCODE_DATA_CORRUPTED patch, so attached v11 is rebased
    and also has the changes mentioned above.
    
    Since you've started reviewing the set, I'll note that patches
    0005-0011 are split up for ease of review and it may not necessarily
    make sense to keep that separation for eventual commit. They are a
    series of steps to move VM updates from lazy_scan_prune() into
    pruneheap.c.
    
    - Melanie
    
  31. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-09T14:00:04Z

    On Mon, Sep 8, 2025 at 6:29 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    > But, I think you're right that maintaining the order of operations
    > proposed in transam/README is more important. As such, in attached
    > v11, I've modified this patch and the other patches where I replace
    > visibilitymap_set() with visibilitymap_set_vmbits() to exclusively
    > lock the vmbuffer before the critical section.
    > visibilitymap_set_vmbits() asserts that we have the vmbuffer
    > exclusively locked, so we should be good.
    
    That sounds good. I think it is OK to keep some of the odd things that
    we're currently doing if they're hard to eliminate, but if they're not
    really needed then I'd rather see us standardize the code. I feel (and
    I think you may agree, based on other conversations that we've had)
    that the visibility map code is somewhat oddly structured, and I'd
    like to see us push the amount of oddness down rather than up, if we
    can reasonably do so without breaking everything.
    
    > The only difference is I replaced the phrase "LSN interlock" with
    > "being dropped or truncated later in recovery" -- which is more
    > specific and, I thought, more clear. Without this comment, it took me
    > some time to understand the scenarios that might lead us to skip
    > updating the heap block. heap_xlog_visible() has cause to describe
    > this situation in an earlier comment -- which is why I think the LSN
    > interlock comment is less confusing there.
    >
    > Anyway, I'm open to changing the comment. I could:
    > 1) copy-paste the same comment as heap_xlog_visible()
    > 2) refer to the comment in heap_xlog_visible() (comment seemed a bit
    > short for that)
    > 3) diverge the comments further by improving the new comment in
    > heap_xlog_multi_insert() in some way
    > 4) something else?
    
    IMHO, copying and pasting comments is not great, and comments with
    identical intent and divergent wording are also not great. The former
    is not great because having a whole bunch of copies of the same
    comment, especially if it's a block comment rather than a 1-liner,
    uses up a bunch of space and creates a maintenance hazard in the sense
    that future updates might not get propagated to all copies. The latter
    is not great because it makes it hard to grep for other instances that
    should be adjusted when you adjust one, and also because if one
    version really is better than the other than ideally we'd like to have
    the good version everywhere. Of course, there's some tension between
    these two goals. In this particular case, thinking a little harder
    about your proposed change, it seems to me that "LSN interlock" is
    more clear about what the immediate test is that would cause us to
    skip updating the heap page, and "being dropped or truncated later in
    recovery" is more clear about what the larger state of the world that
    would lead to that situation is. But whatever preference anyone might
    have about which way to go with that choice, it is hard to see why the
    preference should go one way in one case and the other way in another
    case. Therefore, I favor an approach that leads either to an identical
    comment in both places, or to one comment referring to the other.
    
    > > The second paragraph does not convince me at all. I see no reason to
    > > believe that this is safe, or that it is a good idea. The code in
    > > xlog_heap_visible() thinks its OK to unlock and relock the page to
    > > make visibilitymap_set() happy, which is cringy but probably safe for
    > > lack of concurrent writers, but skipping locking altogether seems
    > > deeply unwise.
    >
    > Actually in master, heap_xlog_visible() has no lock on the heap page
    > when it calls visibiltymap_set(). It releases that lock before
    > recording the freespace in the FSM and doesn't take it again.
    >
    > It does unlock and relock the VM page -- because visibilitymap_set()
    > expects to take the lock on the VM.
    >
    > I agree that not holding the heap lock while updating the VM is
    > unsatisfying. We can't hold it while doing the IO to read in the VM
    > block in XLogReadBufferForRedoExtended(). So, we could take it again
    > before calling visibilitymap_set(). But we don't always have the heap
    > buffer, though. I suspect this is partially why heap_xlog_visible()
    > unconditionally passes InvalidBuffer to visibilitymap_set() as the
    > heap buffer and has special case handling for recovery when we don't
    > have the heap buffer.
    
    You know, I wasn't thinking carefully enough about the distinction
    between the heap page and the visibility map page here. I thought you
    were saying that you were modifying a page without a lock on that
    page, but you aren't: you're saying you're modifying a page without a
    lock on another page to which it is related. The former seems
    disastrous, but the latter might be OK. However, I'm sort of confused
    about what the comment is trying to say to justify that:
    
    +        * It is only okay to set the VM bits without holding the heap page lock
    +        * because we can expect no other writers of this page.
    
    It is not exactly clear to me whether "this page" here refers to the
    heap page or the VM page. If it means the heap page, why should that
    be so if we haven't got any kind of lock? If it means the VM page,
    then why is the heap page even relevant?
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  32. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-09T16:24:20Z

    On Tue, Sep 9, 2025 at 10:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Mon, Sep 8, 2025 at 6:29 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    >
    > > The only difference is I replaced the phrase "LSN interlock" with
    > > "being dropped or truncated later in recovery" -- which is more
    > > specific and, I thought, more clear. Without this comment, it took me
    > > some time to understand the scenarios that might lead us to skip
    > > updating the heap block. heap_xlog_visible() has cause to describe
    > > this situation in an earlier comment -- which is why I think the LSN
    > > interlock comment is less confusing there.
    > >
    > > Anyway, I'm open to changing the comment. I could:
    > > 1) copy-paste the same comment as heap_xlog_visible()
    > > 2) refer to the comment in heap_xlog_visible() (comment seemed a bit
    > > short for that)
    > > 3) diverge the comments further by improving the new comment in
    > > heap_xlog_multi_insert() in some way
    > > 4) something else?
    >
    > IMHO, copying and pasting comments is not great, and comments with
    > identical intent and divergent wording are also not great. The former
    > is not great because having a whole bunch of copies of the same
    > comment, especially if it's a block comment rather than a 1-liner,
    > uses up a bunch of space and creates a maintenance hazard in the sense
    > that future updates might not get propagated to all copies. The latter
    > is not great because it makes it hard to grep for other instances that
    > should be adjusted when you adjust one, and also because if one
    > version really is better than the other than ideally we'd like to have
    > the good version everywhere. Of course, there's some tension between
    > these two goals. In this particular case, thinking a little harder
    > about your proposed change, it seems to me that "LSN interlock" is
    > more clear about what the immediate test is that would cause us to
    > skip updating the heap page, and "being dropped or truncated later in
    > recovery" is more clear about what the larger state of the world that
    > would lead to that situation is. But whatever preference anyone might
    > have about which way to go with that choice, it is hard to see why the
    > preference should go one way in one case and the other way in another
    > case. Therefore, I favor an approach that leads either to an identical
    > comment in both places, or to one comment referring to the other.
    
    I see what you are saying.
    
    For heap_xlog_visible() the LSN interlock comment is easier to parse
    because of an earlier comment before reading the heap page:
    
        /*
         * Read the heap page, if it still exists. If the heap file has dropped or
         * truncated later in recovery, we don't need to update the page, but we'd
         * better still update the visibility map.
         */
    
    I've gone with the direct copy-paste of the LSN interlock paragraph in
    attached v12. I think referring to the other comment is too confusing
    in context here. However, I also added a line about what could cause
    the LSN interlock -- but above it, so as to retain grep-ability of the
    other comment.
    
    > > > The second paragraph does not convince me at all. I see no reason to
    > > > believe that this is safe, or that it is a good idea. The code in
    > > > xlog_heap_visible() thinks its OK to unlock and relock the page to
    > > > make visibilitymap_set() happy, which is cringy but probably safe for
    > > > lack of concurrent writers, but skipping locking altogether seems
    > > > deeply unwise.
    > >
    > > Actually in master, heap_xlog_visible() has no lock on the heap page
    > > when it calls visibiltymap_set(). It releases that lock before
    > > recording the freespace in the FSM and doesn't take it again.
    > >
    > > It does unlock and relock the VM page -- because visibilitymap_set()
    > > expects to take the lock on the VM.
    > >
    > > I agree that not holding the heap lock while updating the VM is
    > > unsatisfying. We can't hold it while doing the IO to read in the VM
    > > block in XLogReadBufferForRedoExtended(). So, we could take it again
    > > before calling visibilitymap_set(). But we don't always have the heap
    > > buffer, though. I suspect this is partially why heap_xlog_visible()
    > > unconditionally passes InvalidBuffer to visibilitymap_set() as the
    > > heap buffer and has special case handling for recovery when we don't
    > > have the heap buffer.
    >
    > You know, I wasn't thinking carefully enough about the distinction
    > between the heap page and the visibility map page here. I thought you
    > were saying that you were modifying a page without a lock on that
    > page, but you aren't: you're saying you're modifying a page without a
    > lock on another page to which it is related. The former seems
    > disastrous, but the latter might be OK. However, I'm sort of confused
    > about what the comment is trying to say to justify that:
    >
    > +        * It is only okay to set the VM bits without holding the heap page lock
    > +        * because we can expect no other writers of this page.
    >
    > It is not exactly clear to me whether "this page" here refers to the
    > heap page or the VM page. If it means the heap page, why should that
    > be so if we haven't got any kind of lock? If it means the VM page,
    > then why is the heap page even relevant?
    
    I've expanded the comment in v12. In normal operation we must have the
    lock on the heap page when setting the VM bits because if another
    backend cleared PD_ALL_VISIBLE, we could have the forbidden scenario
    where PD_ALL_VISIBLE is clear and the VM is set. This is not allowed
    because then someone else may read the VM, conclude the page is
    all-visible, and then an index-only scan can return wrong results. In
    recovery, there are no concurrent writers, so it can't happen.
    
    It is worth discussing how to fix it in heap_xlog_visible() so that
    future scenarios like parallel recovery could not break this. However,
    this patch is not a deviation from the behavior on master, and,
    technically the behavior on master works.
    
    - Melanie
    
  33. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-09T19:26:08Z

    On Tue, Sep 9, 2025 at 12:24 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    > For heap_xlog_visible() the LSN interlock comment is easier to parse
    > because of an earlier comment before reading the heap page:
    >
    >     /*
    >      * Read the heap page, if it still exists. If the heap file has dropped or
    >      * truncated later in recovery, we don't need to update the page, but we'd
    >      * better still update the visibility map.
    >      */
    >
    > I've gone with the direct copy-paste of the LSN interlock paragraph in
    > attached v12. I think referring to the other comment is too confusing
    > in context here. However, I also added a line about what could cause
    > the LSN interlock -- but above it, so as to retain grep-ability of the
    > other comment.
    
    I think that reads a little strangely. I would consolidate: Note that
    the heap relation may have been dropped or truncated, leading us to
    skip updating the heap block due to the LSN interlock. However, even
    in that case, it's still safe to update the visibility map, etc. The
    rest of the comment is perhaps a tad more explicit than our usual
    practice, but that might be a good thing, because sometimes we're a
    little too terse about these critical details.
    
    I just realized that I don't like this:
    
    + /*
    + * If we're only adding already frozen rows to a previously empty
    + * page, mark it as all-frozen and update the visibility map. We're
    + * already holding a pin on the vmbuffer.
    + */
    
    The thing is, we rarely position a block comment just before an "else
    if". There are probably instances, but it's not typical. That's why
    the existing comment contains two "if blah then blah" statements of
    which you deleted the second -- because it needed to cover both the
    "if" and the "else if". An alternative style is to move the comment
    down a nesting level and rephrase without the conditional, ie. "We're
    only adding frozen rows to a previously empty page, so mark it as
    all-frozen etc." But I don't know that I like doing that for one
    branch of the "if" and not the other.
    
    The rest of what's now 0001 looks OK to me now, although you might
    want to wait for a review from somebody more knowledgeable about this
    area.
    
    Some very quick comments on the next few patches -- far from a full review:
    
    0002. Looks boring, probably unobjectionable provided the payoff patch is OK.
    
    0003. What you've done here with xl_heap_prune.flags is kind of
    horrifying. The problem is that, while you've added code explaining
    that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
    nobody who isn't looking directly at that comment is going to
    understand the muddling of the two namespaces. I would suggest not
    doing this, even if it means defining redundant constants and writing
    technically-unnecessary code to translate between them.
    
    0004. It is not clear to me why you need to get
    log_heap_prune_and_freeze to do the work here. Why can't
    log_newpage_buffer get the job done already?
    
    0005. It looks a little curious that you delete the
    identify-corruption logic from the end of the if-nest and add it to
    the beginning. Ceteris paribus, you'd expect that to be worse, since
    corruption is a rare case.
    
    0006. "to me marked" -> "to be marked".
    
    +                * If the heap page is all-visible but the VM bit is
    not set, we don't
    +                * need to dirty the heap page.  However, if checksums
    are enabled, we
    +                * do need to make sure that the heap page is dirtied
    before passing
    +                * it to visibilitymap_set(), because it may be logged.
                     */
    -               PageSetAllVisible(page);
    -               MarkBufferDirty(buf);
    +               if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
    +               {
    +                       PageSetAllVisible(page);
    +                       MarkBufferDirty(buf);
    +               }
    
    I really hate this. Maybe you're going to argue that it's not the job
    of this patch to fix the awfulness here, but surely marking a buffer
    dirty in case some other function decides to WAL-log it is a
    ridiculous plan.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  34. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-09T23:07:58Z

    Thanks for the review! I've made the changes to comments and minor
    fixes you suggested in attached v13 and have limited my inline
    responses to areas where further discussion is required.
    
    On Tue, Sep 9, 2025 at 3:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > 0003. What you've done here with xl_heap_prune.flags is kind of
    > horrifying. The problem is that, while you've added code explaining
    > that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
    > nobody who isn't looking directly at that comment is going to
    > understand the muddling of the two namespaces. I would suggest not
    > doing this, even if it means defining redundant constants and writing
    > technically-unnecessary code to translate between them.
    
    Fair. I've introduced new XLHP flags in attached v13. Hopefully it
    puts an end to the horror.
    
    > 0004. It is not clear to me why you need to get
    > log_heap_prune_and_freeze to do the work here. Why can't
    > log_newpage_buffer get the job done already?
    
    Well, I need something to emit the changes to the VM. I'm eliminating
    all users of xl_heap_visible. Empty pages are the ones that benefit
    the least from switching from xl_heap_visible -> xl_heap_prune. But,
    if I don't transition them, we have to maintain all the
    xl_heap_visible code (including visibilitymap_set() in its long form).
    
    As for log_newpage_buffer(), I could keep it if you think it is too
    confusing to change log_heap_prune_and_freeze()'s API (by passing
    force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
    there and then call log_heap_prune_and_freeze().
    
    I just thought it seemed simple to avoid emitting the new page record
    and the VM update record, so why not -- but I don't have strong
    feelings.
    
    > 0005. It looks a little curious that you delete the
    > identify-corruption logic from the end of the if-nest and add it to
    > the beginning. Ceteris paribus, you'd expect that to be worse, since
    > corruption is a rare case.
    
    On master, the two corruption cases are sandwiched between the normal
    VM set cases. And I actually think doing it this way is brittle. If
    you put the cases which set the VM first, you have to have completely
    bulletproof the if statements guarding them to foreclose any possible
    corruption case from entering because otherwise you will overwrite the
    corruption you then try to detect.
    
    But, specifically, from a performance perspective:
    
    I think moving up the third case doesn't matter because the check is so cheap:
    
        else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
    
    And as for moving up the second case (the other corruption case), the
    non-cheap thing it does is call visibilitymap_get_status()
    
        else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
                 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
    
    But once you call visibilitymap_get_status() once, assuming there is
    no corruption and you need to go set the VM, you've already got that
    page of the VM read, so it is probably pretty cheap. Overall, I didn't
    think this would add noticeable overhead or many wasted operations.
    
    And I thought that reorganizing the code improved clarity as well as
    decreased the likelihood of bugs from insufficiently guarding positive
    cases against corrupt pages and overwriting corruption instead of
    detecting it.
    
    If we're really worried about it from a performance perspective, I
    could add an extra test at the top of identify_and_fix_vm_corruption()
    that dumps out early if (!all_visible_according_to_vm &&
    presult.all_visible).
    
    > +                * If the heap page is all-visible but the VM bit is
    > not set, we don't
    > +                * need to dirty the heap page.  However, if checksums
    > are enabled, we
    > +                * do need to make sure that the heap page is dirtied
    > before passing
    > +                * it to visibilitymap_set(), because it may be logged.
    >                  */
    > -               PageSetAllVisible(page);
    > -               MarkBufferDirty(buf);
    > +               if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
    > +               {
    > +                       PageSetAllVisible(page);
    > +                       MarkBufferDirty(buf);
    > +               }
    >
    > I really hate this. Maybe you're going to argue that it's not the job
    > of this patch to fix the awfulness here, but surely marking a buffer
    > dirty in case some other function decides to WAL-log it is a
    > ridiculous plan.
    
    Right, it isn't pretty. But I don't quite see what the alternative is.
    We need to mark the buffer dirty before setting the LSN. We could
    perhaps rewrite visibilitymap_set()'s API to return the LSN of the
    xl_heap_visible record and stamp it on the heap buffer ourselves. But
    1) I think visibilitymap_set() purposefully conceals its WAL logging
    ways from the caller and propagating that info back up starts to make
    the API messy in another way and 2) I'm a bit loath to make big
    changes to visibilitymap_set() right now since my patch set eventually
    resolves this by putting the changes to the VM and heap page in the
    same WAL record.
    
    - Melanie
    
  35. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-10T20:01:25Z

    On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    > Fair. I've introduced new XLHP flags in attached v13. Hopefully it
    > puts an end to the horror.
    
    I suggest not renumbering all of the existing flags and just adding
    these new ones at the end. Less code churn and more likely to break in
    an obvious way if you mix up the two sets of flags.
    
    More on 0002:
    
    + set_heap_lsn = XLogHintBitIsNeeded() ? true : set_heap_lsn;
    
    Maybe just if (XLogHintBitIsNeeded) set_heap_lsn = true? I don't feel
    super-strongly that what you've done is bad but it looks weird to my
    eyes.
    
    + * If we released any space or line pointers or will be setting a page in
    + * the visibility map, measure the page's freespace to later update the
    
    "setting a page in the visibility map" seems a little muddled to me.
    You set bits, not pages.
    
    + * all-visible (or all-frozen, depending on the vacuum mode,) which is
    
    This comma placement is questionable.
    
      /*
    + * Note that the heap relation may have been dropped or truncated, leading
    + * us to skip updating the heap block due to the LSN interlock. However,
    + * even in that case, it's still safe to update the visibility map. Any
    + * WAL record that clears the visibility map bit does so before checking
    + * the page LSN, so any bits that need to be cleared will still be
    + * cleared.
    + *
    + * Note that the lock on the heap page was dropped above. In normal
    + * operation this would never be safe because a concurrent query could
    + * modify the heap page and clear PD_ALL_VISIBLE -- violating the
    + * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
    + * the VM is set.
    + *
    + * In recovery, we expect no other writers, so writing to the VM page
    + * without holding a lock on the heap page is considered safe enough. It
    + * is done this way when replaying xl_heap_visible records (see
      */
    
    How many copies of this comment do you plan to end up with?
    
    The comment for log_heap_prune_and_freeze seems to be anticipating future work.
    
    > > 0004. It is not clear to me why you need to get
    > > log_heap_prune_and_freeze to do the work here. Why can't
    > > log_newpage_buffer get the job done already?
    >
    > Well, I need something to emit the changes to the VM. I'm eliminating
    > all users of xl_heap_visible. Empty pages are the ones that benefit
    > the least from switching from xl_heap_visible -> xl_heap_prune. But,
    > if I don't transition them, we have to maintain all the
    > xl_heap_visible code (including visibilitymap_set() in its long form).
    >
    > As for log_newpage_buffer(), I could keep it if you think it is too
    > confusing to change log_heap_prune_and_freeze()'s API (by passing
    > force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
    > there and then call log_heap_prune_and_freeze().
    >
    > I just thought it seemed simple to avoid emitting the new page record
    > and the VM update record, so why not -- but I don't have strong
    > feelings.
    
    Yeah, I'm not sure what the right thing to do here is. I think I was
    again experiencing brain fade by forgetting that there is a heap page
    and a VM page and, of course, log_heap_newpage() probably isn't going
    to touch the latter. So that makes sense. On the other hand, we could
    only have one type of WAL record for every single operation in the
    system if we gave it enough flags, and force_heap_fpi seems
    suspiciously like a flag that turns this into a whole different kind
    of WAL record.
    
    > > 0005. It looks a little curious that you delete the
    > > identify-corruption logic from the end of the if-nest and add it to
    > > the beginning. Ceteris paribus, you'd expect that to be worse, since
    > > corruption is a rare case.
    >
    > On master, the two corruption cases are sandwiched between the normal
    > VM set cases. And I actually think doing it this way is brittle. If
    > you put the cases which set the VM first, you have to have completely
    > bulletproof the if statements guarding them to foreclose any possible
    > corruption case from entering because otherwise you will overwrite the
    > corruption you then try to detect.
    
    Hmm. In the current code, we first test (!all_visible_according_to_vm
    && presult.all_visible), then (all_visible_according_to_vm &&
    !PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
    blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
    PageIsAllVisible(page)). The first and second can never coexist,
    because they require opposite values of all_visible_according_to_vm.
    The second and third cannot coexist because they require opposite
    values of PageIsAllVisible(page). It is not entirely obvious that the
    first and third tests couldn't both pass, but you'd have to have
    presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
    to see how heap_page_prune_and_freeze() could ever allow that.
    Consider:
    
        if (prstate.all_visible && prstate.lpdead_items == 0)
        {
            presult->all_visible = prstate.all_visible;
            presult->all_frozen = prstate.all_frozen;
        }
        else
        {
            presult->all_visible = false;
            presult->all_frozen = false;
        }
    ...
        presult->lpdead_items = prstate.lpdead_items;
    
    So I don't really think I'm persuaded that the current way is brittle.
    But that having been said, I agree with you that the order of the
    checks is kind of random, and I don't think it really matters that
    much for performance. What does matter is clarity. I feel like what
    I'd ideally like this logic to do is say: do we want the VM bit for
    the page to be set to all-frozen, just all-visible, or neither? Then
    push the VM bit to the correct state, dragging the page-level bit
    along behind. And the current logic sort of does that. It's roughly:
    
    1. Should we go from not-all-visible to either all-visible or
    all-frozen? If yes, do so.
    2. Should we go from either all-visible or all-frozen to
    not-all-visible? If yes, do so.
    3. Should we go from either all-visible or all-frozen to
    not-all-visible for a different reason? If yes, do so.
    4. Should we go from all-visible to all-frozen? If yes, do so.
    
    But what's weird is that all the tests are written differently, and we
    have two different reasons for going to not-all-visible, namely
    PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
    one test for each of the other state-transitions, because the
    decision-making for those cases is fully completed at an earlier
    stage. I would kind of like to see this expressed in a way that first
    decides which state transition to make (forward-to-all-frozen,
    forward-to-all-visible, backward-to-all-visible,
    backward-to-not-all-visible, nothing) and then does the corresponding
    work. What you're doing instead is splitting half of those functions
    off into a helper function while keeping the other half where they are
    without cleaning up any of the logic. Now, maybe that's OK: I'm far
    from having grokked the whole patch set. But it is not any more clear
    than what we have now, IMHO, and perhaps even a bit less so.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  36. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-18T00:10:07Z

    On Wed, Sep 10, 2025 at 4:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > > Fair. I've introduced new XLHP flags in attached v13. Hopefully it
    > > puts an end to the horror.
    >
    > I suggest not renumbering all of the existing flags and just adding
    > these new ones at the end. Less code churn and more likely to break in
    > an obvious way if you mix up the two sets of flags.
    
    Makes sense. In my attached v14, I have not renumbered them.
    
    > More on 0002:
    
    After an off-list discussion we had about how to make the patches in
    the set progressively improve the code instead of just mechanically
    refactoring it, I have made some big changes in the intermediate
    patches in the set.
    
    Before actually including the VM changes in the vacuum/prune WAL
    records, I first include setting PD_ALL_VISIBLE with the other changes
    to the heap page so that we can remove the heap page from the VM
    setting WAL chain. This happens to fix the bug we discussed where if
    you set an all-visible page all-frozen and checksums/wal_log_hints are
    enabled, you may end up setting an LSN on a page that was not marked
    dirty.
    
    0001 is RFC but waiting on one other reviewer
    0002 - 0007 is a bit of cleanup I had later in the patch set but moved
    up because I think it made the intermediate patches better
    0008 - 0012 removes the heap page from the XLOG_HEAP2_VISIBLE WAL
    chain (it makes all callers of visibilitymap_set() set PD_ALL_VISIBLE
    in the same WAL record as changes to the heap page)
    0013 - 0018 finish the job eliminating XLOG_HEAP2_VISIBLE and set VM
    bits in the same WAL record as the heap changes
    0019 - 0024 set the VM on-access
    
    >   /*
    > + * Note that the heap relation may have been dropped or truncated, leading
    > + * us to skip updating the heap block due to the LSN interlock. However,
    > + * even in that case, it's still safe to update the visibility map. Any
    > + * WAL record that clears the visibility map bit does so before checking
    > + * the page LSN, so any bits that need to be cleared will still be
    > + * cleared.
    > + *
    > + * Note that the lock on the heap page was dropped above. In normal
    > + * operation this would never be safe because a concurrent query could
    > + * modify the heap page and clear PD_ALL_VISIBLE -- violating the
    > + * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
    > + * the VM is set.
    > + *
    > + * In recovery, we expect no other writers, so writing to the VM page
    > + * without holding a lock on the heap page is considered safe enough. It
    > + * is done this way when replaying xl_heap_visible records (see
    >   */
    >
    > How many copies of this comment do you plan to end up with?
    
    By the end, one for copy freeze replay and one for prune/freeze/vacuum
    replay. I felt two wasn't too bad and was easier than meta-explaining
    what the other comment was explaining.
    
    > > > 0004. It is not clear to me why you need to get
    > > > log_heap_prune_and_freeze to do the work here. Why can't
    > > > log_newpage_buffer get the job done already?
    > >
    > > Well, I need something to emit the changes to the VM. I'm eliminating
    > > all users of xl_heap_visible. Empty pages are the ones that benefit
    > > the least from switching from xl_heap_visible -> xl_heap_prune. But,
    > > if I don't transition them, we have to maintain all the
    > > xl_heap_visible code (including visibilitymap_set() in its long form).
    > >
    > > As for log_newpage_buffer(), I could keep it if you think it is too
    > > confusing to change log_heap_prune_and_freeze()'s API (by passing
    > > force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
    > > there and then call log_heap_prune_and_freeze().
    > >
    > > I just thought it seemed simple to avoid emitting the new page record
    > > and the VM update record, so why not -- but I don't have strong
    > > feelings.
    >
    > Yeah, I'm not sure what the right thing to do here is. I think I was
    > again experiencing brain fade by forgetting that there is a heap page
    > and a VM page and, of course, log_heap_newpage() probably isn't going
    > to touch the latter. So that makes sense. On the other hand, we could
    > only have one type of WAL record for every single operation in the
    > system if we gave it enough flags, and force_heap_fpi seems
    > suspiciously like a flag that turns this into a whole different kind
    > of WAL record.
    
    I've kept log_heap_newpage() and used log_heap_prune_and_freeze() for
    setting PD_ALL_VISIBLE and the VM.
    
    > > > 0005. It looks a little curious that you delete the
    > > > identify-corruption logic from the end of the if-nest and add it to
    > > > the beginning. Ceteris paribus, you'd expect that to be worse, since
    > > > corruption is a rare case.
    > >
    > > On master, the two corruption cases are sandwiched between the normal
    > > VM set cases. And I actually think doing it this way is brittle. If
    > > you put the cases which set the VM first, you have to have completely
    > > bulletproof the if statements guarding them to foreclose any possible
    > > corruption case from entering because otherwise you will overwrite the
    > > corruption you then try to detect.
    >
    > Hmm. In the current code, we first test (!all_visible_according_to_vm
    > && presult.all_visible), then (all_visible_according_to_vm &&
    > !PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
    > blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
    > PageIsAllVisible(page)). The first and second can never coexist,
    > because they require opposite values of all_visible_according_to_vm.
    > The second and third cannot coexist because they require opposite
    > values of PageIsAllVisible(page). It is not entirely obvious that the
    > first and third tests couldn't both pass, but you'd have to have
    > presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
    > to see how heap_page_prune_and_freeze() could ever allow that.
    > Consider:
    >
    >     if (prstate.all_visible && prstate.lpdead_items == 0)
    >     {
    >         presult->all_visible = prstate.all_visible;
    >         presult->all_frozen = prstate.all_frozen;
    >     }
    >     else
    >     {
    >         presult->all_visible = false;
    >         presult->all_frozen = false;
    >     }
    > ...
    >     presult->lpdead_items = prstate.lpdead_items;
    >
    > So I don't really think I'm persuaded that the current way is brittle.
    
    I meant brittle because it has to be so carefully coded for it to work
    out this way. If you ever wanted to change or enhance it, it's quite
    hard to know how to make sure all of them are entirely mutually
    exclusive.
    
    > But that having been said, I agree with you that the order of the
    > checks is kind of random, and I don't think it really matters that
    > much for performance. What does matter is clarity. I feel like what
    > I'd ideally like this logic to do is say: do we want the VM bit for
    > the page to be set to all-frozen, just all-visible, or neither? Then
    > push the VM bit to the correct state, dragging the page-level bit
    > along behind. And the current logic sort of does that. It's roughly:
    >
    > 1. Should we go from not-all-visible to either all-visible or
    > all-frozen? If yes, do so.
    > 2. Should we go from either all-visible or all-frozen to
    > not-all-visible? If yes, do so.
    > 3. Should we go from either all-visible or all-frozen to
    > not-all-visible for a different reason? If yes, do so.
    > 4. Should we go from all-visible to all-frozen? If yes, do so.
    
    I don't necessarily agree that fixing corruption and setting the VM
    should be together -- they feel like separate things to me. But, I
    don't feel strongly enough about it to push it.
    
    > But what's weird is that all the tests are written differently, and we
    > have two different reasons for going to not-all-visible, namely
    > PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
    > one test for each of the other state-transitions, because the
    > decision-making for those cases is fully completed at an earlier
    > stage. I would kind of like to see this expressed in a way that first
    > decides which state transition to make (forward-to-all-frozen,
    > forward-to-all-visible, backward-to-all-visible,
    > backward-to-not-all-visible, nothing) and then does the corresponding
    > work. What you're doing instead is splitting half of those functions
    > off into a helper function while keeping the other half where they are
    > without cleaning up any of the logic. Now, maybe that's OK: I'm far
    > from having grokked the whole patch set. But it is not any more clear
    > than what we have now, IMHO, and perhaps even a bit less so.
    
    In terms of my patch set, I do have to change something about this
    mixture of fixing corruption and setting the VM because I need to set
    the VM bits in the same critical section as making the other changes
    to the heap page (pruning, etc) and include the VM set changes in the
    same WAL record (note that clearing the VM to fix corruption is not
    WAL-logged).
    
    What I've gone with is determining what to set the VM bits to and then
    fixing the corruption at the same time. Then, later, when making the
    changes to the heap page, I actually set the VM. This is kind of the
    opposite of what you suggested above -- determining what to set the
    bits to altogether -- corruption and non-corruption cases together. I
    don't think we can do that though, because fixing the corruption is
    non WAL-logged changes to the page and VM and setting the VM bits is a
    WAL-logged change. And, you can't clear bits with visibilitymap_set()
    (there's an assertion about that). So you have to call different
    functions (not to mention emit distinct error messages). I don't know
    that I've come up with the ideal solution, though.
    
    - Melanie
    
  37. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Andres Freund <andres@anarazel.de> — 2025-09-18T16:48:45Z

    Hi,
    
    On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote:
    > 0001 is RFC but waiting on one other reviewer
    
    > From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Tue, 17 Jun 2025 17:22:10 -0400
    > Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
    > diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
    > index cf843277938..faa7c561a8a 100644
    > --- a/src/backend/access/heap/heapam_xlog.c
    > +++ b/src/backend/access/heap/heapam_xlog.c
    > @@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
    >  	int			i;
    >  	bool		isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
    >  	XLogRedoAction action;
    > +	Buffer		vmbuffer = InvalidBuffer;
    >
    >  	/*
    >  	 * Insertion doesn't overwrite MVCC data, so no conflict processing is
    > @@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
    >  	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
    >  	{
    >  		Relation	reln = CreateFakeRelcacheEntry(rlocator);
    > -		Buffer		vmbuffer = InvalidBuffer;
    >
    >  		visibilitymap_pin(reln, blkno, &vmbuffer);
    >  		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
    >  		ReleaseBuffer(vmbuffer);
    > +		vmbuffer = InvalidBuffer;
    >  		FreeFakeRelcacheEntry(reln);
    >  	}
    >
    > @@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
    >  	if (BufferIsValid(buffer))
    >  		UnlockReleaseBuffer(buffer);
    >
    > +	buffer = InvalidBuffer;
    > +
    > +	/*
    > +	 * Now read and update the VM block.
    > +	 *
    > +	 * Note that the heap relation may have been dropped or truncated, leading
    > +	 * us to skip updating the heap block due to the LSN interlock.
    
    I don't fully understand this - how does dropping/truncating the relation lead
    to skipping due to the LSN interlock?
    
    
    > +	 * even in that case, it's still safe to update the visibility map. Any
    > +	 * WAL record that clears the visibility map bit does so before checking
    > +	 * the page LSN, so any bits that need to be cleared will still be
    > +	 * cleared.
    > +	 *
    > +	 * Note that the lock on the heap page was dropped above. In normal
    > +	 * operation this would never be safe because a concurrent query could
    > +	 * modify the heap page and clear PD_ALL_VISIBLE -- violating the
    > +	 * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
    > +	 * the VM is set.
    > +	 *
    > +	 * In recovery, we expect no other writers, so writing to the VM page
    > +	 * without holding a lock on the heap page is considered safe enough. It
    > +	 * is done this way when replaying xl_heap_visible records (see
    > +	 * heap_xlog_visible()).
    > +	 */
    > +	if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
    > +		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
    > +									  &vmbuffer) == BLK_NEEDS_REDO)
    > +	{
    
    Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from
    heap_xlog_visible(), but I don't immediately understand (or remember) why we
    do so there either?
    
    
    > +		Page		vmpage = BufferGetPage(vmbuffer);
    > +		Relation	reln = CreateFakeRelcacheEntry(rlocator);
    
    Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd
    really like to eventually get rid of that and given that the new "code shape"
    delegates a lot more responsibility to the redo routines, they should have a
    fairly easy time not needing a fake relcache?  Afaict the relation already is
    not used outside of debugging paths?
    
    
    > +		/* initialize the page if it was read as zeros */
    > +		if (PageIsNew(vmpage))
    > +			PageInit(vmpage, BLCKSZ, 0);
    > +
    > +		visibilitymap_set_vmbits(reln, blkno,
    > +								 vmbuffer,
    > +								 VISIBILITYMAP_ALL_VISIBLE |
    > +								 VISIBILITYMAP_ALL_FROZEN);
    > +
    > +		/*
    > +		 * It is not possible that the VM was already set for this heap page,
    > +		 * so the vmbuffer must have been modified and marked dirty.
    > +		 */
    
    I assume that's because we a) checked the LSN interlock b) are replaying
    something that needed to newly set the bit?
    
    
    Except for the above comments, this looks pretty good to me.
    
    
    Seems 0002 should just be applied...
    
    
    Re 0003: I wonder if it's getting to the point that a struct should be used as
    the argument.
    
    Greetings,
    
    Andres Freund
    
    
    
    
  38. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-09-24T17:07:46Z

    On Thu, Sep 18, 2025 at 12:48 PM Andres Freund <andres@anarazel.de> wrote:
    >
    > On 2025-09-17 20:10:07 -0400, Melanie Plageman wrote:
    >
    > > +     /*
    > > +      * Now read and update the VM block.
    > > +      *
    > > +      * Note that the heap relation may have been dropped or truncated, leading
    > > +      * us to skip updating the heap block due to the LSN interlock.
    >
    > I don't fully understand this - how does dropping/truncating the relation lead
    > to skipping due to the LSN interlock?
    
    Yes, this wasn't right. I misunderstood.
    
    What I think it should say is that if the heap update was skipped due
    to LSN interlock we still have to replay the updates to the VM because
    each vm page contains bits for multiple heap blocks and if the record
    included a vm page FPI, subsequent updates to the VM may rely on this
    FPI to avoid torn pages. We don't condition it on the heap redo having
    been an FPI, probably because it is not worth it -- but I wonder if
    that is worth calling out in the comment?
    
    Do we also need to replay it when the heap redo returns BLK_NOTFOUND?
    I assume this can happen in the case of relation dropped or truncated
    -- but in this case there wouldn't be subsequent records updating the
    VM for other heap blocks that we need to replay because the other heap
    blocks won't be found either, right?
    
    > > +     if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
    > > +             XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
    > > +                                                                       &vmbuffer) == BLK_NEEDS_REDO)
    > > +     {
    >
    > Why are we using RBM_ZERO_ON_ERROR here? I know it's copied from
    > heap_xlog_visible(), but I don't immediately understand (or remember) why we
    > do so there either?
    
    It has been RBM_ZERO_ON_ERROR since XLogReadBufferForRedoExtended()
    was introduced here in 2c03216d8311.
    I think we probably do this because vm_readbuf() passes ReadBuffer()
    RBM_ZERO_ON_ERROR and has this comment
    
         * For reading we use ZERO_ON_ERROR mode, and initialize the page if
         * necessary. It's always safe to clear bits, so it's better to clear
         * corrupt pages than error out.
    
    Do you think I also should have a comment in heap_xlog_multi_insert()?
    
    > > +             Page            vmpage = BufferGetPage(vmbuffer);
    > > +             Relation        reln = CreateFakeRelcacheEntry(rlocator);
    >
    > Hm. Do we really need to continue doing this ugly fake relcache stuff? I'd
    > really like to eventually get rid of that and given that the new "code shape"
    > delegates a lot more responsibility to the redo routines, they should have a
    > fairly easy time not needing a fake relcache?  Afaict the relation already is
    > not used outside of debugging paths?
    
    Yes, interestingly we don't have the relname in recovery anyway, so it
    does all this fake relcache stuff only to convert the relfilenode to a
    string and uses that.
    
    The fake relcache stuff will still be used by visibilitymap_pin()
    which callers like heap_xlog_delete() use to get the VM page. And I
    don't think it is worth coming up with a version of that that doesn't
    use the relcache. But you're right that the Relation is not needed for
    visibilitymap_set_vmbits(). I've changed it to just take the relation
    name as a string.
    
    
    > > +             /* initialize the page if it was read as zeros */
    > > +             if (PageIsNew(vmpage))
    > > +                     PageInit(vmpage, BLCKSZ, 0);
    > > +
    > > +             visibilitymap_set_vmbits(reln, blkno,
    > > +                                                              vmbuffer,
    > > +                                                              VISIBILITYMAP_ALL_VISIBLE |
    > > +                                                              VISIBILITYMAP_ALL_FROZEN);
    > > +
    > > +             /*
    > > +              * It is not possible that the VM was already set for this heap page,
    > > +              * so the vmbuffer must have been modified and marked dirty.
    > > +              */
    >
    > I assume that's because we a) checked the LSN interlock b) are replaying
    > something that needed to newly set the bit?
    
    Yes, perhaps it is not worth having the assert since it attracts extra
    attention to an invariant that is unlikely to be in danger of
    regression.
    
    > Seems 0002 should just be applied...
    
    Done
    
    > Re 0003: I wonder if it's getting to the point that a struct should be used as
    > the argument.
    
    I have been thinking about this. I have yet to come up with a good
    idea for a struct name or multiple struct names that seem to fit here.
    I could move the other output parameters into the PruneFreezeResult
    and then maybe make some kind of PruneFreezeParameters struct or
    something?
    
    - Melanie
    
  39. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Robert Haas <robertmhaas@gmail.com> — 2025-09-24T20:13:43Z

    I find this patch set quite hard to follow. 0001 altogether removes
    the use of XLOG_HEAP2_VISIBLE in cases where we use
    XLOG_HEAP2_MULTI_INSERT, but then 0007 (the next non-refactoring
    patch) begins half-removing the dependency on XLOG_HEAP2_VISIBLE,
    assisted by 0009 and 0010, and then later you come back and remove the
    other half of the dependency. I know it was I who proposed (off-list)
    first making the XLOG_HEAP2_VISIBLE record only deal with the VM page
    and not the heap buffer, but I'm not sure that idea quite worked out
    in terms of making this easier to follow. At the least, it seems weird
    that COPY FREEZE is an exception that gets handled in a different way
    than all the other cases, fully removing the dependency in one step.
    It would also be nice if each time you repost this, or maybe in a
    README that you post along beside the actual patches, you'd include
    some kind of roadmap to help the reader understand the internal
    structure of the patch set, like 1 does this, 2-9 get us to here,
    10-whatever get us to this next place.
    
    I don't really understand how the interlocking works. 0011 changes
    visibilitymap_set so that it doesn't take the heap block as an
    argument, but we'd better hold a lock on the heap page while setting
    the VM bit, otherwise I think somebody could come along and modify the
    heap page after we decided it was all-visible and before we actually
    set the VM bit, which would be terrible. I would expect the comments
    and the commit message to say something about that, but I don't see
    that they do.
    
    I find myself fearful of the way that 0007 propagates the existing
    hacks around setting the VM bit into a new place:
    
    +               /*
    +                * We always emit a WAL record when setting
    PD_ALL_VISIBLE, but we are
    +                * careful not to emit a full page image unless
    +                * checksums/wal_log_hints are enabled. We only set
    the heap page LSN
    +                * if full page images were an option when emitting
    WAL. Otherwise,
    +                * subsequent modifications of the page may
    incorrectly skip emitting
    +                * a full page image.
    +                */
    +               if (do_prune || nplans > 0 ||
    +                       (xlrec.flags & XLHP_SET_PD_ALL_VIS &&
    XLogHintBitIsNeeded()))
    +                       PageSetLSN(page, lsn);
    
    I suppose it's not the worst thing to duplicate this logic, because
    you're later going to remove the original copy. But, it took me >10
    minutes to find the text in src/backend/access/transam/README, in the
    second half of the "Writing Hints" section, that explains the overall
    principle here, and since the patch set doesn't seem to touch that
    text, maybe you weren't even aware it was there. And, it's a little
    weird to have a single WAL record that is either a hint or not a hint
    depending on a complex set of conditions. (IMHO mixing & and &&
    without parentheses is quite brave, and an explicit != 0 might not be
    a bad idea either.)
    
    Anyway, I kind of wonder if it's time to back out the hack that I
    installed here many years ago. At the time, I thought that it would be
    bad if a VACUUM swept over the visibility map setting VM bits and as a
    result emitted an FPI for every page in the entire heap ... but
    everyone who is running with checksums has accepted that cost already,
    and with those being the default, that's probably going to be most
    people. It would be even more compelling if we were going to freeze,
    prune, and set all-visible on access, because then presumably the case
    where we touch a page and ONLY set the VM bit would be rare, so the
    cost of doing that wouldn't matter much, but I guess the patch doesn't
    go that far -- we can freeze or set all-visible on access but not
    prune, without which the scenario I was worrying about at the time is
    still fairly plausible, I think, if checksums are turned off.
    
    -- 
    Robert Haas
    EDB: http://www.enterprisedb.com
    
    
    
    
  40. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-10-06T22:40:20Z

    On Wed, Sep 24, 2025 at 4:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
    >
    > I find this patch set quite hard to follow. 0001 altogether removes
    > the use of XLOG_HEAP2_VISIBLE in cases where we use
    > XLOG_HEAP2_MULTI_INSERT, but then 0007 (the next non-refactoring
    > patch) begins half-removing the dependency on XLOG_HEAP2_VISIBLE,
    > assisted by 0009 and 0010, and then later you come back and remove the
    > other half of the dependency. I know it was I who proposed (off-list)
    > first making the XLOG_HEAP2_VISIBLE record only deal with the VM page
    > and not the heap buffer, but I'm not sure that idea quite worked out
    > in terms of making this easier to follow. At the least, it seems weird
    > that COPY FREEZE is an exception that gets handled in a different way
    > than all the other cases, fully removing the dependency in one step.
    > It would also be nice if each time you repost this, or maybe in a
    > README that you post along beside the actual patches, you'd include
    > some kind of roadmap to help the reader understand the internal
    > structure of the patch set, like 1 does this, 2-9 get us to here,
    > 10-whatever get us to this next place.
    
    In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
    entirely, rather than first removing each caller's heap page from the
    VM WAL chain. I reordered changes and squashed several refactoring
    patches to improve patch-by-patch readability. This should make the
    set read differently from earlier versions that removed
    XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.
    
    I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
    having intermediate patches that just set PD_ALL_VISIBLE when making
    other heap pages are more confusing than helpful. Also, I think having
    separate flags for setting PD_ALL_VISIBLE in the WAL record
    over-complicated the code.
    
    0001:  remove XLOG_HEAP2_VISIBLE from COPY FREEZE
    0002 - 0005: various refactoring in advance of removing
    XLOG_HEAP2_VISIBLE in pruning
    0006: Pruning and freezing by vacuum sets the VM and emits a single
    WAL record with those changes
    0007: Reaping (phase III) by vacuum sets the VM and sets line pointers
    unused in a single WAL record
    0008 - 0009: XLOG_HEAP2_VISIBLE is eliminated
    0010 - 0012: preparation for setting VM on-access
    0013: set VM on-access
    0014: set pd_prune_xid on insert
    
    > I find myself fearful of the way that 0007 propagates the existing
    > hacks around setting the VM bit into a new place:
    >
    > +               /*
    > +                * We always emit a WAL record when setting
    > PD_ALL_VISIBLE, but we are
    > +                * careful not to emit a full page image unless
    > +                * checksums/wal_log_hints are enabled. We only set
    > the heap page LSN
    > +                * if full page images were an option when emitting
    > WAL. Otherwise,
    > +                * subsequent modifications of the page may
    > incorrectly skip emitting
    > +                * a full page image.
    > +                */
    > +               if (do_prune || nplans > 0 ||
    > +                       (xlrec.flags & XLHP_SET_PD_ALL_VIS &&
    > XLogHintBitIsNeeded()))
    > +                       PageSetLSN(page, lsn);
    >
    > I suppose it's not the worst thing to duplicate this logic, because
    > you're later going to remove the original copy. But, it took me >10
    > minutes to find the text in src/backend/access/transam/README, in the
    > second half of the "Writing Hints" section, that explains the overall
    > principle here, and since the patch set doesn't seem to touch that
    > text, maybe you weren't even aware it was there.
    
    I don't think that src/backend/access/transam/README must change with
    my patch. It is still true that if the only change we are making to
    the heap page is setting PD_ALL_VISIBLE and checksums/wal_log_hints
    are disabled, we explicitly avoid an FPI and thus can't stamp the page
    LSN.
    
    > And, it's a little
    > weird to have a single WAL record that is either a hint or not a hint
    > depending on a complex set of conditions.
    
    PD_ALL_VISIBLE is different from tuple hints and other page hints
    because setting the VM is always WAL logged and when we replay that,
    it will always set PD_ALL_VISIBLE, so PD_ALL_VISIBLE is effectively
    always WAL-logged. The other hints aren't wal-logged unless checksums
    are enabled and we need an FPI. So PD_ALL_VISIBLE is different from
    other page hints in multiple ways. We can't make it more like those
    hints because of needing to preserve the invariant that the VM is
    never set when the page is clear. The only thing we could do is forbid
    omitting the FPI even when checksums are not enabled.
    
    > Anyway, I kind of wonder if it's time to back out the hack that I
    > installed here many years ago. At the time, I thought that it would be
    > bad if a VACUUM swept over the visibility map setting VM bits and as a
    > result emitted an FPI for every page in the entire heap ... but
    > everyone who is running with checksums has accepted that cost already,
    > and with those being the default, that's probably going to be most
    > people.
    
    I agree that PD_ALL_VISIBLE persistence is complicated, but we have
    other special cases that complicate the code for a performance
    benefit. I guess the question is if we are saying people shouldn't run
    without checksums in production. If that's true, then it's fine to
    remove this optimization. Otherwise, I'm not so sure.
    
    I think cloud providers generally have checksums enabled, but I don't
    know what is common on-prem.
    
    > It would be even more compelling if we were going to freeze,
    > prune, and set all-visible on access, because then presumably the case
    > where we touch a page and ONLY set the VM bit would be rare, so the
    > cost of doing that wouldn't matter much, but I guess the patch doesn't
    > go that far -- we can freeze or set all-visible on access but not
    > prune, without which the scenario I was worrying about at the time is
    > still fairly plausible, I think, if checksums are turned off.
    
    With the whole set applied, we can prune and set the VM on access but
    not freeze. I have a patch to do that, but it introduced noticeable
    CPU overhead to prepare the freeze plans. I'd have to spend much more
    time studying it to avoid regressing workloads where we don't end up
    freezing but prepare the freeze plans during SELECT queries.
    
    - Melanie
    
  41. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-10-08T22:54:25Z

    On Mon, Oct 6, 2025 at 6:40 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
    > entirely, rather than first removing each caller's heap page from the
    > VM WAL chain. I reordered changes and squashed several refactoring
    > patches to improve patch-by-patch readability. This should make the
    > set read differently from earlier versions that removed
    > XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.
    >
    > I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
    > having intermediate patches that just set PD_ALL_VISIBLE when making
    > other heap pages are more confusing than helpful. Also, I think having
    > separate flags for setting PD_ALL_VISIBLE in the WAL record
    > over-complicated the code.
    
    I decided to reorder the patches to remove XLOG_HEAP2_VISIBLE from
    vacuum phase III before removing it from vacuum phase I because
    removing it from phase III doesn't require preliminary refactoring
    patches. I've done that in the attached v17.
    
    I've also added an experimental patch on the end that refactors large
    chunks of heap_page_prune_and_freeze() into helpers. I got some
    feedback off-list that heap_page_prune_and_freeze() is too unwieldy
    now. I'm not sure how I feel about them yet, so I haven't documented
    them or moved them up in the patch set to before changes to
    heap_page_prune_and_freeze().
    
    0001: Eliminate XLOG_HEAP2_VISIBLE from COPY FREEZE
    0002: Eliminate XLOG_HEAP2_VISIBLE from phase III of vacuum
    0003 - 0006: cleanup and refactoring to prepare for 0007
    0007: Eliminate XLOG_HEAP2_VISIBLE from vacuum prune/freeze
    0008 - 0009: Remove XLOG_HEAP2_VISIBLE
    0010 - 0012: refactoring to prepare for 0013
    0013: Set VM on-access
    0014: Set pd_prune_xid on insert
    0015: Experimental refactoring of heap_page_prune_and_freeze into helpers
    
    - Melanie
    
  42. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Andres Freund <andres@anarazel.de> — 2025-10-09T18:18:49Z

    Hi,
    
    On 2025-10-08 18:54:25 -0400, Melanie Plageman wrote:
    > +uint8
    > +visibilitymap_set_vmbits(BlockNumber heapBlk,
    > +						 Buffer vmBuf, uint8 flags,
    > +						 const char *heapRelname)
    > +{
    > +	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
    > +	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
    > +	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
    > +	Page		page;
    > +	uint8	   *map;
    > +	uint8		status;
    > +
    > +#ifdef TRACE_VISIBILITYMAP
    > +	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
    > +		 flags, heapRelname, heapBlk);
    > +#endif
    
    I like it doesn't take a Relation anymore, but I'd just pass the smgrrelation
    instead, then you don't need to allocate the string in the caller, when it's
    approximately never used.
    
    Otherwise this looks pretty close to me.
    
    
    
    > @@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    >  	}
    >  
    >  	/*
    > -	 * If we have a full-page image, restore it and we're done.
    > +	 * If we have a full-page image of the heap block, restore it and we're
    > +	 * done with the heap block.
    >  	 */
    > -	action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
    > -										   (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
    > -										   &buffer);
    > -	if (action == BLK_NEEDS_REDO)
    > +	if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
    > +									  (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
    > +									  &buffer) == BLK_NEEDS_REDO)
    >  	{
    >  		Page		page = BufferGetPage(buffer);
    >  		OffsetNumber *redirected;
    
    Why move it around this way?
    
    
    > @@ -138,36 +157,104 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    >  		/* There should be no more data */
    >  		Assert((char *) frz_offsets == dataptr + datalen);
    >  
    > +		if ((vmflags & VISIBILITYMAP_VALID_BITS))
    > +			PageSetAllVisible(page);
    > +
    > +		MarkBufferDirty(buffer);
    > +
    > +		/*
    > +		 * Always emit a WAL record when setting PD_ALL_VISIBLE but only emit
    > +		 * an FPI if checksums/wal_log_hints are enabled.
    
    This comment reads as-if we're WAL logging here, but this is a
    Wendy's^Wrecovery.
    
    > Advance the page LSN
    > +		 * only if the record could include an FPI, since recovery skips
    > +		 * records <= the stamped LSN. Otherwise it might skip an earlier FPI
    > +		 * needed to repair a torn page.
    > +		 */
    
    This is confusing, should probably just reference the stuff we did in the
    !recovery case.
    
    
    > +		if (do_prune || nplans > 0 ||
    > +			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
    > +			PageSetLSN(page, lsn);
    > +
    >  		/*
    >  		 * Note: we don't worry about updating the page's prunability hints.
    >  		 * At worst this will cause an extra prune cycle to occur soon.
    >  		 */
    
    Not your fault, but that seems odd? Why aren't we just doing the right thing?
    
    >  	/*
    > -	 * If we released any space or line pointers, update the free space map.
    > +	 * If we released any space or line pointers or set PD_ALL_VISIBLE or the
    > +	 * VM, update the freespace map.
    
    I'd replace the first or with a , ;)
    
    
    > +	 * Even when no actual space is freed (e.g., when only marking the page
    > +	 * all-visible or frozen), we still update the FSM. Because the FSM is
    > +	 * unlogged and maintained heuristically, it often becomes stale on
    > +	 * standbys. If such a standby is later promoted and runs VACUUM, it will
    > +	 * skip recalculating free space for pages that were marked all-visible
    > +	 * (or all-frozen, depending on the mode). FreeSpaceMapVacuum can then
    > +	 * propagate overly optimistic free space values upward, causing future
    > +	 * insertions to select pages that turn out to be unusable. In bulk, this
    > +	 * can lead to long stalls.
    > +	 *
    > +	 * To prevent this, always refresh the FSM’s view when a page becomes
    > +	 * all-visible or all-frozen.
    
    I'd s/refresh/update/, because refresh sounds more like rereading the current
    state of the FSM, rather than changing the FSM.
    
    
    > +		/* We don't have relation name during recovery, so use relfilenode */
    > +		relname = psprintf("%u", rlocator.relNumber);
    > +		old_vmbits = visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, relname);
    >  
    > -			XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
    > +		/* Only set VM page LSN if we modified the page */
    > +		if (old_vmbits != vmflags)
    > +		{
    > +			Assert(BufferIsDirty(vmbuffer));
    > +			PageSetLSN(BufferGetPage(vmbuffer), lsn);
    >  		}
    > -		else
    > -			UnlockReleaseBuffer(buffer);
    > +		pfree(relname);
    
    Hm. When can we actually enter the old_vmbits == vmflags case?  It might also
    be fine to just say that we don't expect it to change but are mirroring the
    code in visibilitymap_set().
    
    
    I wonder if the VM specific redo portion should be in a common helper? Might
    not be enough code to worry though...
    
    
    > @@ -2070,8 +2079,24 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
    >  	xlhp_prune_items dead_items;
    >  	xlhp_prune_items unused_items;
    >  	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
    > +	bool		do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
    > +	bool		do_set_vm = vmflags & VISIBILITYMAP_VALID_BITS;
    >  
    >  	xlrec.flags = 0;
    > +	regbuf_flags = REGBUF_STANDARD;
    > +
    > +	Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
    > +
    > +	/*
    > +	 * We can avoid an FPI if the only modification we are making to the heap
    > +	 * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
    
    Maybe s/an FPI/an FPI for the heap pae/?
    
    
    > +	 * Note that if we explicitly skip an FPI, we must not set the heap page
    > +	 * LSN later.
    > +	 */
    > +	if (!do_prune &&
    > +		nfrozen == 0 &&
    > +		(!do_set_vm || !XLogHintBitIsNeeded()))
    > +		regbuf_flags |= REGBUF_NO_IMAGE;
    
    >  	/*
    >  	 * Prepare data for the buffer.  The arrays are not actually in the
    > @@ -2079,7 +2104,11 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
    >  	 * page image, the arrays can be omitted.
    >  	 */
    >  	XLogBeginInsert();
    > -	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
    > +	XLogRegisterBuffer(0, buffer, regbuf_flags);
    > +
    > +	if (do_set_vm)
    > +		XLogRegisterBuffer(1, vmbuffer, 0);
    
    Seems a bit confusing that it's named regbuf_flags but isn't used all the
    XLogRegisterBuffer calls. Maybe make the name more specific
    (regbuf_flags_heap?)...
    
    >  	}
    >  	recptr = XLogInsert(RM_HEAP2_ID, info);
    >  
    > -	PageSetLSN(BufferGetPage(buffer), recptr);
    > +	if (do_set_vm)
    > +	{
    > +		Assert(BufferIsDirty(vmbuffer));
    > +		PageSetLSN(BufferGetPage(vmbuffer), recptr);
    > +	}
    
    > +	/*
    > +	 * We must bump the page LSN if pruning or freezing. If we are only
    > +	 * updating PD_ALL_VISIBLE, though, we can skip doing this unless
    > +	 * wal_log_hints/checksums are enabled. Torn pages are possible if we
    > +	 * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
    > +	 * for page hint updates.
    > +	 */
    
    Arguably it's not a torn page if we only modified something as narrow as a
    hint bit, and are redoing that change after recovery. But that's extremely
    nitpicky.
    
    I wonder if the comment explaining this should be put into one place and
    reference it from all the different places.
    
    > @@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
    >  							 VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
    >  							 InvalidOffsetNumber);
    >  
    > +	/*
    > +	 * Before marking dead items unused, check whether the page will become
    > +	 * all-visible once that change is applied.
    
    So the function is named _would_ but here you say will :)
    
    
    > This lets us reap the tuples
    > +	 * and mark the page all-visible within the same critical section,
    > +	 * enabling both changes to be emitted in a single WAL record. Since the
    > +	 * visibility checks may perform I/O and allocate memory, they must be
    > +	 * done outside the critical section.
    > +	 */
    > +	if (heap_page_would_be_all_visible(vacrel, buffer,
    > +									   deadoffsets, num_offsets,
    > +									   &all_frozen, &visibility_cutoff_xid))
    > +	{
    > +		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
    > +		if (all_frozen)
    > +		{
    > +			vmflags |= VISIBILITYMAP_ALL_FROZEN;
    > +			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
    > +		}
    > +
    > +		/* Take the lock on the vmbuffer before entering a critical section */
    > +		LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    
    It sure would be nice if we had documented the lock order between the heap
    page and the corresponding VM page anywhere.  This is just doing what we did
    before, so it's not this patch's fault, but I did get worried about it for a
    moment.
    
    
    > +/*
    > + * Check whether the heap page in buf is all-visible except for the dead
    > + * tuples referenced in the deadoffsets array.
    > + *
    > + * The visibility checks may perform IO and allocate memory so they must not
    > + * be done in a critical section. This function is used by vacuum to determine
    > + * if the page will be all-visible once it reaps known dead tuples. That way
    > + * it can do both in the same critical section and emit a single WAL record.
    > + *
    > + * Returns true if the page is all-visible other than the provided
    > + * deadoffsets and false otherwise.
    > + *
    > + * Output parameters:
    > + *
    > + *  - *all_frozen: true if every tuple on the page is frozen
    > + *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
    > + * Callers looking to verify that the page is already all-visible can call
    > + * heap_page_is_all_visible().
    > + *
    > + * This logic is closely related to heap_prune_record_unchanged_lp_normal().
    > + * If you modify this function, ensure consistency with that code. An
    > + * assertion cross-checks that both remain in agreement. Do not introduce new
    > + * side-effects.
    > + */
    > +static bool
    > +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
    > +							   OffsetNumber *deadoffsets,
    > +							   int ndeadoffsets,
    > +							   bool *all_frozen,
    > +							   TransactionId *visibility_cutoff_xid)
    > +{
    >  	Page		page = BufferGetPage(buf);
    >  	BlockNumber blockno = BufferGetBlockNumber(buf);
    >  	OffsetNumber offnum,
    >  				maxoff;
    >  	bool		all_visible = true;
    > +	int			matched_dead_count = 0;
    >  
    >  	*visibility_cutoff_xid = InvalidTransactionId;
    >  	*all_frozen = true;
    >  
    > +	Assert(ndeadoffsets == 0 || deadoffsets);
    > +
    > +#ifdef USE_ASSERT_CHECKING
    > +	/* Confirm input deadoffsets[] is strictly sorted */
    > +	if (ndeadoffsets > 1)
    > +	{
    > +		for (int i = 1; i < ndeadoffsets; i++)
    > +			Assert(deadoffsets[i - 1] < deadoffsets[i]);
    > +	}
    > +#endif
    > +
    >  	maxoff = PageGetMaxOffsetNumber(page);
    >  	for (offnum = FirstOffsetNumber;
    >  		 offnum <= maxoff && all_visible;
    > @@ -3649,9 +3712,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
    >  		 */
    >  		if (ItemIdIsDead(itemid))
    >  		{
    > -			all_visible = false;
    > -			*all_frozen = false;
    > -			break;
    > +			if (!deadoffsets ||
    > +				matched_dead_count >= ndeadoffsets ||
    > +				deadoffsets[matched_dead_count] != offnum)
    > +			{
    > +				*all_frozen = all_visible = false;
    > +				break;
    > +			}
    > +			matched_dead_count++;
    > +			continue;
    >  		}
    >  
    >  		Assert(ItemIdIsNormal(itemid));
    
    Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
    the end?
    
    
    > From 6b5fc27f0d80bab1df86a2e6fb51b64fd20c3cbb Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Mon, 15 Sep 2025 12:06:19 -0400
    > Subject: [PATCH v17 03/15] Assorted trivial heap_page_prune_and_freeze cleanup
    
    Seems like a good idea, but I'm too lazy to go through this in detail.
    
    
    > From c69a5219a9b792f3c9f6dc730b8810a88d088ae6 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Tue, 16 Sep 2025 14:22:10 -0400
    > Subject: [PATCH v17 04/15] Add helper for freeze determination to
    >  heap_page_prune_and_freeze
    > 
    > After scanning through the line pointers on the heap page during
    > vacuum's first phase, we use several statuses and information we
    > collected to determine whether or not we will use the freeze plans we
    > assembled.
    > 
    > Do this in a helper for better readability.
    
    
    > @@ -663,85 +775,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
    >  	 * Decide if we want to go ahead with freezing according to the freeze
    >  	 * plans we prepared, or not.
    >  	 */
    > -	do_freeze = false;
    > - ...
    > +	do_freeze = heap_page_will_freeze(params->relation, buffer,
    > +									  did_tuple_hint_fpi,
    > +									  do_prune,
    > +									  do_hint_prune,
    > +									  &prstate);
    >  
    
    Assuming this is just moving the code, I like this quite bit.
    
    
    > From d4a4be3eed25853fc1ea84ebc2cbe0226afd823a Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Mon, 15 Sep 2025 16:25:44 -0400
    > Subject: [PATCH v17 05/15] Update PruneState.all_[visible|frozen] earlier in
    >  pruning
    > MIME-Version: 1.0
    > Content-Type: text/plain; charset=UTF-8
    > Content-Transfer-Encoding: 8bit
    > 
    > In the prune/freeze path, we currently delay clearing all_visible and
    > all_frozen when dead items are present. This allows opportunistic
    > freezing if the page would otherwise be fully frozen, since those dead
    > items are later removed in vacuum’s third phase.
    > 
    > However, if no freezing will be attempted, there’s no need to delay.
    > Clearing the flags promptly avoids extra bookkeeping in
    > heap_prune_unchanged_lp_normal(). At present this has no runtime effect
    > because all callers that consider setting the VM also attempt freezing,
    > but future callers (e.g. on-access pruning) may want to set the VM
    > without preparing freeze plans.
    
    s/heap_prune_unchanged_lp_normal/heap_prune_record_unchanged_lp_normal/
    
    I think this should make it clearer that this is about reducing overhead for
    future use of this code in on-access-pruning.
    
    
    > We also used to defer clearing all_visible and all_frozen until after
    > computing the visibility cutoff XID. By determining the cutoff earlier,
    > we can update these flags immediately after deciding whether to
    > opportunistically freeze. This is necessary if we want to set the VM in
    > the same WAL record that prunes and freezes tuples on the page.
    
    I think this last sentence needs to be first. This is the only really
    important thing in this patch, afaict.
    
    
    
    > From 86193a71d2ff9649b5b1c1e6963bd610285ad369 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Fri, 3 Oct 2025 15:57:02 -0400
    > Subject: [PATCH v17 06/15] Make heap_page_is_all_visible independent of
    >  LVRelState
    > 
    > Future commits will use this function inside of pruneheap.c where we do
    > not have access to the LVRelState. We only need a few parameters from
    > the LVRelState, so just pass those in explicitly.
    > 
    > Author: Melanie Plageman <melanieplageman@gmail.com>
    > Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
    > Reviewed-by: Robert Haas <robertmhaas@gmail.com>
    > Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
    
    Makes sense. I don't think we need to wait for other stuff to be committed to
    commit this.
    
    
    > From dde0dfc578137f7c93f9a0e34af38dcdb841b080 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Wed, 8 Oct 2025 15:39:01 -0400
    > Subject: [PATCH v17 07/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum
    >  prune/freeze
    > MIME-Version: 1.0
    > Content-Type: text/plain; charset=UTF-8
    > Content-Transfer-Encoding: 8bit
    
    Seems very mildly odd that 0002 references phase III in the subject, but this
    doesn't...
    
    (I'm just very lightly skimming from this point on)
    
    
    > During vacuum's first and third phases, we examine tuples' visibility
    > to determine if we can set the page all-visible in the visibility map.
    > 
    > Previously, this check compared tuple xmins against a single XID chosen at
    > the start of vacuum (OldestXmin). We now use GlobalVisState, which also
    > enables future work to set the VM during on-access pruning, since ordinary
    > queries have access to GlobalVisState but not OldestXmin.
    > 
    > This also benefits vacuum directly: GlobalVisState may advance
    > during a vacuum, allowing more pages to become considered all-visible.
    > In the rare case that it moves backward, VACUUM falls back to OldestXmin
    > to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
    > prunable according to the GlobalVisState.
    
    It could, but it currently won't advance in vacuum, right?
    
    
    > From e412f9298b0735d1091f4769ace4d2d1a7e62312 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Tue, 29 Jul 2025 09:57:13 -0400
    > Subject: [PATCH v17 12/15] Inline TransactionIdFollows/Precedes()
    > 
    > Calling these from on-access pruning code had noticeable overhead in a
    > profile. There does not seem to be a reason not to inline them.
    
    Makes sense, just commit this ahead of the more complicated rest.
    
    
    
    > From 54fcba140e515eba0eb1f9d48e7d5875b92e7e39 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Tue, 29 Jul 2025 14:34:30 -0400
    > Subject: [PATCH v17 13/15] Allow on-access pruning to set pages all-visible
    
    Sorry, will have to look at this another time...
    
    Greetings,
    
    Andres Freund
    
    
    
    
  43. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-10-14T03:31:04Z

    On Thu, 9 Oct 2025 at 03:54, Melanie Plageman <melanieplageman@gmail.com> wrote:
    >
    > On Mon, Oct 6, 2025 at 6:40 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > >
    > > In attached v16, I’ve reverted to removing XLOG_HEAP2_VISIBLE
    > > entirely, rather than first removing each caller's heap page from the
    > > VM WAL chain. I reordered changes and squashed several refactoring
    > > patches to improve patch-by-patch readability. This should make the
    > > set read differently from earlier versions that removed
    > > XLOG_HEAP2_VISIBLE and had more step-by-step mechanical refactoring.
    > >
    > > I think if we plan to go all the way with removing XLOG_HEAP2_VISIBLE,
    > > having intermediate patches that just set PD_ALL_VISIBLE when making
    > > other heap pages are more confusing than helpful. Also, I think having
    > > separate flags for setting PD_ALL_VISIBLE in the WAL record
    > > over-complicated the code.
    >
    > I decided to reorder the patches to remove XLOG_HEAP2_VISIBLE from
    > vacuum phase III before removing it from vacuum phase I because
    > removing it from phase III doesn't require preliminary refactoring
    > patches. I've done that in the attached v17.
    >
    > I've also added an experimental patch on the end that refactors large
    > chunks of heap_page_prune_and_freeze() into helpers. I got some
    > feedback off-list that heap_page_prune_and_freeze() is too unwieldy
    > now. I'm not sure how I feel about them yet, so I haven't documented
    > them or moved them up in the patch set to before changes to
    > heap_page_prune_and_freeze().
    >
    > 0001: Eliminate XLOG_HEAP2_VISIBLE from COPY FREEZE
    > 0002: Eliminate XLOG_HEAP2_VISIBLE from phase III of vacuum
    > 0003 - 0006: cleanup and refactoring to prepare for 0007
    > 0007: Eliminate XLOG_HEAP2_VISIBLE from vacuum prune/freeze
    > 0008 - 0009: Remove XLOG_HEAP2_VISIBLE
    > 0010 - 0012: refactoring to prepare for 0013
    > 0013: Set VM on-access
    > 0014: Set pd_prune_xid on insert
    > 0015: Experimental refactoring of heap_page_prune_and_freeze into helpers
    >
    > - Melanie
    
    Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
    or do we wait for full set to be committed?
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  44. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Michael Paquier <michael@paquier.xyz> — 2025-10-14T03:42:59Z

    On Tue, Oct 14, 2025 at 08:31:04AM +0500, Kirill Reshke wrote:
    > Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
    > or do we wait for full set to be committed?
    
    I may be missing something, of course, but d96f87332 has not changed
    the WAL format, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN
    existing before that.  The change in xl_heap_prune as done in
    add323da40a6 should have bumped the format number.
    --
    Michael
    
  45. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-10-14T14:16:24Z

    On Mon, Oct 13, 2025 at 11:43 PM Michael Paquier <michael@paquier.xyz> wrote:
    >
    > On Tue, Oct 14, 2025 at 08:31:04AM +0500, Kirill Reshke wrote:
    > > Hi! Should we also bump XLOG_PAGE_MAGIC after d96f87332 & add323da40a
    > > or do we wait for full set to be committed?
    >
    > I may be missing something, of course, but d96f87332 has not changed
    > the WAL format, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN
    > existing before that.  The change in xl_heap_prune as done in
    > add323da40a6 should have bumped the format number.
    
    Oops! Thanks for reporting.
    
    I messed up and forgot to do this. And, if I'm not misunderstanding
    the criteria, I did the same thing at the beginning of September with
    4b5f206de2bb. I've committed the bump. Hopefully I learned my lesson.
    
    - Melanie
    
    
    
    
  46. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-10-14T23:26:57Z

    Thanks so much for the review! I've addressed all your feedback except
    what is commented on inline below.
    I've gone ahead and committed the preliminary patches that you thought
    were ready to commit.
    
    Attached v18 is what remains.
    
    0001 - 0003: refactoring
    0004 - 0006: finish eliminating XLOG_HEAP2_VISIBLE
    0007 - 0009: refactoring
    0010: Set VM on-access
    0011: Set prune xid on insert
    0012: Some refactoring for discussion
    
    For 0001, I got feedback heap_page_prune_and_freeze() has too many
    arguments, so I tried to address that. I'm interested to know if folks
    like this more.
    
    0011 still needs a bit of investigation to understand fully if
    anything else in the index-killtuples test needs to be changed to make
    sure we have the same coverage.
    
    0012 is sort of WIP. I got feedback heap_page_prune_and_freeze() was
    too long and should be split up into helpers. I want to know if this
    split makes sense. I can pull it down the patch stack if so.
    
    Only 0001 and 0012 are optional amongst the refactoring patches. The
    others are required to make on-access VM-setting possible or viable.
    
    On Thu, Oct 9, 2025 at 2:18 PM Andres Freund <andres@anarazel.de> wrote:
    >
    > > @@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    > >       }
    > > -     action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
    > > -                                                                                (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
    > > -                                                                                &buffer);
    > > -     if (action == BLK_NEEDS_REDO)
    > > +     if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
    > > +                                                                       (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
    > > +                                                                       &buffer) == BLK_NEEDS_REDO)
    > >       {
    > >               Page            page = BufferGetPage(buffer);
    > >               OffsetNumber *redirected;
    >
    > Why move it around this way?
    
    Because there will be an action for the visibility map
    XLogReadBufferForRedoExtended(). I could have renamed it heap_action,
    but it is being used only in one place, so I preferred to just cut it
    to avoid any confusion.
    
    > > Advance the page LSN
    > > +              * only if the record could include an FPI, since recovery skips
    > > +              * records <= the stamped LSN. Otherwise it might skip an earlier FPI
    > > +              * needed to repair a torn page.
    > > +              */
    >
    > This is confusing, should probably just reference the stuff we did in the
    > !recovery case.
    
    I fixed this and addressed all your feedback related to this before committing.
    
    > > +             if (do_prune || nplans > 0 ||
    > > +                     ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
    > > +                     PageSetLSN(page, lsn);
    > > +
    > >               /*
    > >                * Note: we don't worry about updating the page's prunability hints.
    > >                * At worst this will cause an extra prune cycle to occur soon.
    > >                */
    >
    > Not your fault, but that seems odd? Why aren't we just doing the right thing?
    
    The comment dates back to 6f10eb2. I imagine no one ever bothered to
    fuss with extracting the XID. You could change
    heap_page_prune_execute() to return the right value -- though that's a
    bit ugly since it is used in normal operation as well as recovery.
    
    > I wonder if the VM specific redo portion should be in a common helper? Might
    > not be enough code to worry though...
    
    I think it might be more code as a helper at this point.
    
    > > @@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
    > >                                                        VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
    > >                                                        InvalidOffsetNumber);
    > >
    > > +     /*
    > > +      * Before marking dead items unused, check whether the page will become
    > > +      * all-visible once that change is applied.
    >
    > So the function is named _would_ but here you say will :)
    
    I thought about it more and still feel that this function name should
    contain "would". From vacuum's perspective it is "will" -- because it
    knows it will remove those dead items, but from the function's
    perspective it is hypothetical. I changed the comment though.
    
    > > +     if (heap_page_would_be_all_visible(vacrel, buffer,
    > > +                                                                        deadoffsets, num_offsets,
    > > +                                                                        &all_frozen, &visibility_cutoff_xid))
    > > +     {
    > > +             vmflags |= VISIBILITYMAP_ALL_VISIBLE;
    > > +             if (all_frozen)
    > > +             {
    > > +                     vmflags |= VISIBILITYMAP_ALL_FROZEN;
    > > +                     Assert(!TransactionIdIsValid(visibility_cutoff_xid));
    > > +             }
    > > +
    > > +             /* Take the lock on the vmbuffer before entering a critical section */
    > > +             LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    >
    > It sure would be nice if we had documented the lock order between the heap
    > page and the corresponding VM page anywhere.  This is just doing what we did
    > before, so it's not this patch's fault, but I did get worried about it for a
    > moment.
    
    Well, the comment above the visibilitymap_set* functions says what
    expectations they have for the heap page being locked.
    
    > > +static bool
    > > +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
    > > +                                                        OffsetNumber *deadoffsets,
    > > +                                                        int ndeadoffsets,
    > > +                                                        bool *all_frozen,
    > > +                                                        TransactionId *visibility_cutoff_xid)
    > > +{
    > >       Page            page = BufferGetPage(buf);
    
    > Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
    > the end?
    
    I was going to put an Assert(ndeadoffsets <= matched_dead_count), but
    then I started wondering if there is a way we could end up with fewer
    dead items than we collected during phase I.
    
    I had thought about if we dropped an index and then did on-access
    pruning -- but we don't allow setting LP_DEAD items LP_UNUSED in
    on-access pruning. So, maybe this is safe... I can do a follow-on
    commit to add the assert. But I'm just not 100% sure I've thought of
    all the cases where we might end up with fewer dead items.
    
    > > During vacuum's first and third phases, we examine tuples' visibility
    > > to determine if we can set the page all-visible in the visibility map.
    > >
    > > Previously, this check compared tuple xmins against a single XID chosen at
    > > the start of vacuum (OldestXmin). We now use GlobalVisState, which also
    > > enables future work to set the VM during on-access pruning, since ordinary
    > > queries have access to GlobalVisState but not OldestXmin.
    > >
    > > This also benefits vacuum directly: GlobalVisState may advance
    > > during a vacuum, allowing more pages to become considered all-visible.
    > > In the rare case that it moves backward, VACUUM falls back to OldestXmin
    > > to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
    > > prunable according to the GlobalVisState.
    >
    > It could, but it currently won't advance in vacuum, right?
    
    I thought it was possible for it to advance when calling
    heap_prune_satisfies_vacuum() ->
    GlobalVisTestIsRemovableXid()->...GlobalVisUpdate(). This case isn't
    going to be common, but some things can cause us to update it.
    
    We have talked about explicitly updating GlobalVisState more often
    during vacuums of large tables. But I was under the impression that it
    was at least possible for it to advance during vacuum now.
    
    - Melanie
    
  47. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-10-29T11:03:14Z

    On Wed, 15 Oct 2025 at 04:27, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > Thanks so much for the review! I've addressed all your feedback except
    > what is commented on inline below.
    > I've gone ahead and committed the preliminary patches that you thought
    > were ready to commit.
    >
    > Attached v18 is what remains.
    >
    > 0001 - 0003: refactoring
    > 0004 - 0006: finish eliminating XLOG_HEAP2_VISIBLE
    > 0007 - 0009: refactoring
    > 0010: Set VM on-access
    > 0011: Set prune xid on insert
    > 0012: Some refactoring for discussion
    >
    > For 0001, I got feedback heap_page_prune_and_freeze() has too many
    > arguments, so I tried to address that. I'm interested to know if folks
    > like this more.
    >
    > 0011 still needs a bit of investigation to understand fully if
    > anything else in the index-killtuples test needs to be changed to make
    > sure we have the same coverage.
    >
    > 0012 is sort of WIP. I got feedback heap_page_prune_and_freeze() was
    > too long and should be split up into helpers. I want to know if this
    > split makes sense. I can pull it down the patch stack if so.
    >
    > Only 0001 and 0012 are optional amongst the refactoring patches. The
    > others are required to make on-access VM-setting possible or viable.
    >
    > On Thu, Oct 9, 2025 at 2:18 PM Andres Freund <andres@anarazel.de> wrote:
    > >
    > > > @@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    > > >       }
    > > > -     action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
    > > > -                                                                                (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
    > > > -                                                                                &buffer);
    > > > -     if (action == BLK_NEEDS_REDO)
    > > > +     if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
    > > > +                                                                       (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
    > > > +                                                                       &buffer) == BLK_NEEDS_REDO)
    > > >       {
    > > >               Page            page = BufferGetPage(buffer);
    > > >               OffsetNumber *redirected;
    > >
    > > Why move it around this way?
    >
    > Because there will be an action for the visibility map
    > XLogReadBufferForRedoExtended(). I could have renamed it heap_action,
    > but it is being used only in one place, so I preferred to just cut it
    > to avoid any confusion.
    >
    > > > Advance the page LSN
    > > > +              * only if the record could include an FPI, since recovery skips
    > > > +              * records <= the stamped LSN. Otherwise it might skip an earlier FPI
    > > > +              * needed to repair a torn page.
    > > > +              */
    > >
    > > This is confusing, should probably just reference the stuff we did in the
    > > !recovery case.
    >
    > I fixed this and addressed all your feedback related to this before committing.
    >
    > > > +             if (do_prune || nplans > 0 ||
    > > > +                     ((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
    > > > +                     PageSetLSN(page, lsn);
    > > > +
    > > >               /*
    > > >                * Note: we don't worry about updating the page's prunability hints.
    > > >                * At worst this will cause an extra prune cycle to occur soon.
    > > >                */
    > >
    > > Not your fault, but that seems odd? Why aren't we just doing the right thing?
    >
    > The comment dates back to 6f10eb2. I imagine no one ever bothered to
    > fuss with extracting the XID. You could change
    > heap_page_prune_execute() to return the right value -- though that's a
    > bit ugly since it is used in normal operation as well as recovery.
    >
    > > I wonder if the VM specific redo portion should be in a common helper? Might
    > > not be enough code to worry though...
    >
    > I think it might be more code as a helper at this point.
    >
    > > > @@ -2860,6 +2867,29 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
    > > >                                                        VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
    > > >                                                        InvalidOffsetNumber);
    > > >
    > > > +     /*
    > > > +      * Before marking dead items unused, check whether the page will become
    > > > +      * all-visible once that change is applied.
    > >
    > > So the function is named _would_ but here you say will :)
    >
    > I thought about it more and still feel that this function name should
    > contain "would". From vacuum's perspective it is "will" -- because it
    > knows it will remove those dead items, but from the function's
    > perspective it is hypothetical. I changed the comment though.
    >
    > > > +     if (heap_page_would_be_all_visible(vacrel, buffer,
    > > > +                                                                        deadoffsets, num_offsets,
    > > > +                                                                        &all_frozen, &visibility_cutoff_xid))
    > > > +     {
    > > > +             vmflags |= VISIBILITYMAP_ALL_VISIBLE;
    > > > +             if (all_frozen)
    > > > +             {
    > > > +                     vmflags |= VISIBILITYMAP_ALL_FROZEN;
    > > > +                     Assert(!TransactionIdIsValid(visibility_cutoff_xid));
    > > > +             }
    > > > +
    > > > +             /* Take the lock on the vmbuffer before entering a critical section */
    > > > +             LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    > >
    > > It sure would be nice if we had documented the lock order between the heap
    > > page and the corresponding VM page anywhere.  This is just doing what we did
    > > before, so it's not this patch's fault, but I did get worried about it for a
    > > moment.
    >
    > Well, the comment above the visibilitymap_set* functions says what
    > expectations they have for the heap page being locked.
    >
    > > > +static bool
    > > > +heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
    > > > +                                                        OffsetNumber *deadoffsets,
    > > > +                                                        int ndeadoffsets,
    > > > +                                                        bool *all_frozen,
    > > > +                                                        TransactionId *visibility_cutoff_xid)
    > > > +{
    > > >       Page            page = BufferGetPage(buf);
    >
    > > Hm, what about an assert checking that matched_dead_count == ndeadoffsets at
    > > the end?
    >
    > I was going to put an Assert(ndeadoffsets <= matched_dead_count), but
    > then I started wondering if there is a way we could end up with fewer
    > dead items than we collected during phase I.
    >
    > I had thought about if we dropped an index and then did on-access
    > pruning -- but we don't allow setting LP_DEAD items LP_UNUSED in
    > on-access pruning. So, maybe this is safe... I can do a follow-on
    > commit to add the assert. But I'm just not 100% sure I've thought of
    > all the cases where we might end up with fewer dead items.
    >
    > > > During vacuum's first and third phases, we examine tuples' visibility
    > > > to determine if we can set the page all-visible in the visibility map.
    > > >
    > > > Previously, this check compared tuple xmins against a single XID chosen at
    > > > the start of vacuum (OldestXmin). We now use GlobalVisState, which also
    > > > enables future work to set the VM during on-access pruning, since ordinary
    > > > queries have access to GlobalVisState but not OldestXmin.
    > > >
    > > > This also benefits vacuum directly: GlobalVisState may advance
    > > > during a vacuum, allowing more pages to become considered all-visible.
    > > > In the rare case that it moves backward, VACUUM falls back to OldestXmin
    > > > to ensure we don’t attempt to freeze a dead tuple that wasn’t yet
    > > > prunable according to the GlobalVisState.
    > >
    > > It could, but it currently won't advance in vacuum, right?
    >
    > I thought it was possible for it to advance when calling
    > heap_prune_satisfies_vacuum() ->
    > GlobalVisTestIsRemovableXid()->...GlobalVisUpdate(). This case isn't
    > going to be common, but some things can cause us to update it.
    >
    > We have talked about explicitly updating GlobalVisState more often
    > during vacuums of large tables. But I was under the impression that it
    > was at least possible for it to advance during vacuum now.
    >
    > - Melanie
    
    
    Hi!
    
    First of all, I rechecked v18 patches, they still cause WAL bytes
    reduction. In a no-index vacuum case my result is a 39% reduction in
    WAL bytes.
    Almost like in your first message.
    
    Here are my comments about code, I may be very nitpicky in minor
    details, sorry for that
    
    In 0003:
    
    get_conflict_xid function logic is bit strange for me, it assigns
    conflict_xid to some value,  but in the very end we have
    
    > + /*
    >+ * We can omit the snapshot conflict horizon if we are not pruning or
    >+ * freezing any tuples and are setting an already all-visible page
    >+ * all-frozen in the VM. In this case, all of the tuples on the page must
    >+ * already be visible to all MVCC snapshots on the standby.
    >+ */
    >+ if (!do_prune && !do_freeze &&
    >+ do_set_vm && blk_already_av && set_blk_all_frozen)
    > + conflict_xid = InvalidTransactionId;
    
    I feel like we should move this check to the beginning of the
    function, and just  return InvalidTransactionId in that if cond.
    
    in 0004:
    
    > + if (old_vmbits == new_vmbits)
    > + {
    > + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    > + /* Unset so we don't emit WAL since no change occurred */
    > + do_set_vm = false;
    > + }
    
    and then
    
    >  END_CRIT_SECTION();
    > + if (do_set_vm)
    > + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    > +
    
    So, in the heap_page_prune_and_freeze function we release buffer lock
    both inside and outside the crit section. As I understand, this is
    actually safe. I also looked in other xlog coding practices for other
    access methods (GiST, GIN, ....), and I can see that some of them
    release buffers before leaving crit sections and some of them after.
    But I still suggest to be in sync with 'Write-Ahead Log Coding'
    section of
    src/backend/access/transam/README, which says:
    
    6. END_CRIT_SECTION()
    
    7. Unlock and unpin the buffer(s).
    
    Let's be consistent in this at least in this single function context.
    
    
    In 0010:
    
    I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
    ScanOptions is the right thing to do. Looks like VM bits are something
    that make sense for HEAP AM for not for any TAM. So, don't we break
    some layer of abstraction here? Would it be better for HEAP AM to set
    some flags in heap_beginscan?
    
    
    Overall 0001-0003 are mostly fine for me, 0004-0006 are the right
    thing to do IMHO, but maybe they need some more review from hackers.
    Other patches i did not review in a great detail, will return to this
    later
    
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  48. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-11-04T16:48:15Z

    Thanks for the review!
    
    On Wed, Oct 29, 2025 at 7:03 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > get_conflict_xid function logic is bit strange for me, it assigns
    > conflict_xid to some value,  but in the very end we have
    >
    > > + /*
    > >+ * We can omit the snapshot conflict horizon if we are not pruning or
    > >+ * freezing any tuples and are setting an already all-visible page
    > >+ * all-frozen in the VM. In this case, all of the tuples on the page must
    > >+ * already be visible to all MVCC snapshots on the standby.
    > >+ */
    > >+ if (!do_prune && !do_freeze &&
    > >+ do_set_vm && blk_already_av && set_blk_all_frozen)
    > > + conflict_xid = InvalidTransactionId;
    >
    > I feel like we should move this check to the beginning of the
    > function, and just  return InvalidTransactionId in that if cond.
    
    You're right. I've changed it as you suggest in attached v19.
    
    > > + if (old_vmbits == new_vmbits)
    > > + {
    > > + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    > > + /* Unset so we don't emit WAL since no change occurred */
    > > + do_set_vm = false;
    > > + }
    >
    > and then
    >
    > >  END_CRIT_SECTION();
    > > + if (do_set_vm)
    > > + LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    > > +
    >
    > So, in the heap_page_prune_and_freeze function we release buffer lock
    > both inside and outside the crit section. As I understand, this is
    > actually safe. I also looked in other xlog coding practices for other
    > access methods (GiST, GIN, ....), and I can see that some of them
    > release buffers before leaving crit sections and some of them after.
    > But I still suggest to be in sync with 'Write-Ahead Log Coding'
    > section of
    > src/backend/access/transam/README, which says:
    >
    > 6. END_CRIT_SECTION()
    >
    > 7. Unlock and unpin the buffer(s).
    >
    > Let's be consistent in this at least in this single function context.
    
    I see what you are saying. However, I don't see a good way to
    determine whether or not we need to unlock the VM without introducing
    another local variable in the outermost scope -- like "unlock_vm".
    This function already has a lot of local variables, so I'm loath to do
    that. And we want do_set_vm to reflect whether or not we actually set
    it in case it gets used in the future.
    
    This function doesn't lock or unlock the heap buffer so it doesn't
    seem as urgent to me to follow the letter of the law in this case.
    
    Attached patch doesn't have this change, but this is what it would look like:
    
        /* Lock vmbuffer before entering a critical section */
    +   unlock_vm = do_set_vm;
        if (do_set_vm)
            LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
    
    @@ -1112,12 +1114,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
                old_vmbits = visibilitymap_set(blockno,
                                               vmbuffer, new_vmbits,
                                               params->relation->rd_locator);
    -           if (old_vmbits == new_vmbits)
    -           {
    -               LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    -               /* Unset so we don't emit WAL since no change occurred */
    -               do_set_vm = false;
    -           }
    +
    +           /* Unset so we don't emit WAL since no change occurred */
    +           do_set_vm = old_vmbits != new_vmbits;
            }
    
            /*
    @@ -1145,7 +1144,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
    
        END_CRIT_SECTION();
    
    -   if (do_set_vm)
    +   if (unlock_vm)
            LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    
    > In 0010:
    >
    > I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
    > ScanOptions is the right thing to do. Looks like VM bits are something
    > that make sense for HEAP AM for not for any TAM. So, don't we break
    > some layer of abstraction here? Would it be better for HEAP AM to set
    > some flags in heap_beginscan?
    
    I don't see another good way of doing it.
    
    The information about whether or not the relation is modified in the
    query is gathered during planning and saved in the plan. We need to
    get that information to the scan descriptor, which is all we have when
    we call heap_page_prune_opt() during the scan. The scan descriptor is
    created by the table AM implementations of scan_begin(). The table AM
    callbacks don't pass down the plan -- which makes sense; the scan
    shouldn't know about the plan. They do pass down flags, so I thought
    it made the most sense to add a flag. Note that I was able to avoid
    modifying the actual table and index AM callbacks (scan_begin() and
    ambeginscan()). I only made new wrappers that took "modifies_rel".
    
    Now, it is true that referring to the VM is somewhat of a layering
    violation. Though, other table AMs may use the information about if
    the query modifies the relation -- which is really what this flag
    represents. The ScanOptions are usually either a type or a call to
    action. Which is why I felt a bit uncomfortable calling it something
    like SO_MODIFIES_REL -- which is less of an option and more a piece of
    information. And it makes it sound like the scan modifies the rel,
    which is not the case. I wonder if there is another solution. Or maybe
    we call it SO_QUERY_MODIFIES_REL?
    
    - Melanie
    
  49. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-11-17T23:07:27Z

    Attached v20 has general cleanup, changes to the table/index AM
    callbacks detailed below, and it moves the
    heap_page_prune_and_freeze() refactoring commit down the stack to
    0004.
    
    0001 - 0003 are fairly trivial cleanup patches. I think they are ready
    to commit, so if I don't hear any objections in the next few days,
    I'll go ahead and commit them.
    
    On Tue, Nov 4, 2025 at 11:48 AM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > On Wed, Oct 29, 2025 at 7:03 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    > >
    > > In 0010:
    > >
    > > I'm not terribly convenient that adding SO_ALLOW_VM_SET to TAM
    > > ScanOptions is the right thing to do. Looks like VM bits are something
    > > that make sense for HEAP AM for not for any TAM. So, don't we break
    > > some layer of abstraction here? Would it be better for HEAP AM to set
    > > some flags in heap_beginscan?
    >
    > I don't see another good way of doing it.
    >
    > The information about whether or not the relation is modified in the
    > query is gathered during planning and saved in the plan. We need to
    > get that information to the scan descriptor, which is all we have when
    > we call heap_page_prune_opt() during the scan. The scan descriptor is
    > created by the table AM implementations of scan_begin(). The table AM
    > callbacks don't pass down the plan -- which makes sense; the scan
    > shouldn't know about the plan. They do pass down flags, so I thought
    > it made the most sense to add a flag. Note that I was able to avoid
    > modifying the actual table and index AM callbacks (scan_begin() and
    > ambeginscan()). I only made new wrappers that took "modifies_rel".
    >
    > Now, it is true that referring to the VM is somewhat of a layering
    > violation. Though, other table AMs may use the information about if
    > the query modifies the relation -- which is really what this flag
    > represents. The ScanOptions are usually either a type or a call to
    > action. Which is why I felt a bit uncomfortable calling it something
    > like SO_MODIFIES_REL -- which is less of an option and more a piece of
    > information. And it makes it sound like the scan modifies the rel,
    > which is not the case. I wonder if there is another solution. Or maybe
    > we call it SO_QUERY_MODIFIES_REL?
    
    Attached v20 changes the ScanOption name to SO_HINT_REL_READ_ONLY and
    removes the new helper functions which took modifies_rel as a
    parameter. Instead it modifies the existing
    table_beginscan()/index_beginscan() helpers and the relevant callbacks
    they invoke to have a new flags parameter. These are additional caller
    provider flags.
    
    In master, the IndexScan structures and helpers don't use ScanOptions,
    but since I'm using them for properties of the base relation, I think
    it is fine. I'm not sure if I should name the parameter base_rel_flags
    instead of flags for the index-related callbacks and helpers or if
    leaving it more generic is better, though.
    
    - Melanie
    
  50. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-11-19T09:35:39Z

    On Tue, 18 Nov 2025 at 04:07, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > Attached v20 has general cleanup, changes to the table/index AM
    > callbacks detailed below, and it moves the
    > heap_page_prune_and_freeze() refactoring commit down the stack to
    > 0004.
    >
    > 0001 - 0003 are fairly trivial cleanup patches. I think they are ready
    > to commit, so if I don't hear any objections in the next few days,
    > I'll go ahead and commit them.
    >
    
    
    Hi! I looked up these 0002-0003 patches once again, LGTM. In
    particular, I think 0002 & 0003 makes VM bits management more simple.
    My only review comment is about 0003:
    Should we make frz_conflict_horizon not a heap_page_will_freeze's
    argument but rather just another field of  PruneState struct? If i'm
    not mistaken, 'frz_conflict_horizon' fits good to be a part of pruning
    state
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  51. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-11-19T23:13:30Z

    On Wed, Nov 19, 2025 at 4:35 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > Hi! I looked up these 0002-0003 patches once again, LGTM. In
    > particular, I think 0002 & 0003 makes VM bits management more simple.
    
    Thanks for the review!
    
    > My only review comment is about 0003:
    > Should we make frz_conflict_horizon not a heap_page_will_freeze's
    > argument but rather just another field of  PruneState struct? If i'm
    > not mistaken, 'frz_conflict_horizon' fits good to be a part of pruning
    > state
    
    Since it is passed into one of the helpers, I think I agree. Attached
    v21 has this change.
    
    - Melanie
    
  52. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-11-20T17:19:58Z

    On Wed, Nov 19, 2025 at 6:13 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > Since it is passed into one of the helpers, I think I agree. Attached
    > v21 has this change.
    
    I've committed the first three patches. Attached v22 is the remaining
    patches which set the VM in heap_page_prune_and_freeze() for vacuum
    and then allow on-access pruning to also set the VM.
    
    - Melanie
    
  53. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> — 2025-11-20T17:55:05Z

    Melanie Plageman <melanieplageman@gmail.com> writes:
    
    > +			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
    > +				.reason = PRUNE_ON_ACCESS,.options = 0,
    > +				.vistest = vistest,.cutoffs = NULL
    > +			};
    
    I didn't pay much attention to this thread, so I didn't notice this
    until it got committed, but I'd like to lodge an objection to this
    formatting, especially the lack of spaces before the field names. This
    would be much more readable with one struct field per line, i.e.
    
    	PruneFreezeParams params = {
    		.relation = rel,
                    .buffer = buf,
    		.reason = PRUNE_VACUUM_SCAN,
    		.options = HEAP_PAGE_PRUNE_FREEZE,
    		.vistest = vacrel->vistest,
    		.cutoffs = &vacrel->cutoffs,
    	};
    
    or at a pinch, if we're really being stingy with the vertical space:
    
    	PruneFreezeParams params = {
    		.relation = rel, .buffer = buf,
                    .reason = PRUNE_VACUUM_SCAN, .options = HEAP_PAGE_PRUNE_FREEZE,
    		.vistest = vacrel->vistest, .cutoffs = &vacrel->cutoffs,
    	};
    
    I had a quick grep, and every other designated struct initialiser I
    could find uses the one-field-per-line form, but they're not consistent
    about the comma after the last field.  I personally prefer having it, so
    that one can add more fields later without having to modify the
    unrelated line.
    
    - ilmari
    
    
    
    
  54. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> — 2025-11-20T18:02:20Z

    Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:
    
    > Melanie Plageman <melanieplageman@gmail.com> writes:
    >
    >> +			PruneFreezeParams params = {.relation = relation,.buffer = buffer,
    >> +				.reason = PRUNE_ON_ACCESS,.options = 0,
    >> +				.vistest = vistest,.cutoffs = NULL
    >> +			};
    >
    > I didn't pay much attention to this thread, so I didn't notice this
    > until it got committed, but I'd like to lodge an objection to this
    > formatting, especially the lack of spaces before the field names. This
    > would be much more readable with one struct field per line, i.e.
    >
    > 	PruneFreezeParams params = {
    > 		.relation = rel,
    >                 .buffer = buf,
    > 		.reason = PRUNE_VACUUM_SCAN,
    > 		.options = HEAP_PAGE_PRUNE_FREEZE,
    > 		.vistest = vacrel->vistest,
    > 		.cutoffs = &vacrel->cutoffs,
    > 	};
    
    D'oh, my mail client untabified the .buffer line while I was editing it,
    that should of course be:
    
    	PruneFreezeParams params = {
    		.relation = rel,
    		.buffer = buf,
    		.reason = PRUNE_VACUUM_SCAN,
    		.options = HEAP_PAGE_PRUNE_FREEZE,
    		.vistest = vacrel->vistest,
    		.cutoffs = &vacrel->cutoffs,
    	};
    
    - ilmari
    
    
    
    
  55. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-11-20T22:23:05Z

    On Thu, Nov 20, 2025 at 12:55 PM Dagfinn Ilmari Mannsåker
    <ilmari@ilmari.org> wrote:
    >
    > I didn't pay much attention to this thread, so I didn't notice this
    > until it got committed, but I'd like to lodge an objection to this
    > formatting, especially the lack of spaces before the field names. This
    > would be much more readable with one struct field per line, i.e.
    >
    >         PruneFreezeParams params = {
    >                 .relation = rel,
    >                 .buffer = buf,
    >                 .reason = PRUNE_VACUUM_SCAN,
    >                 .options = HEAP_PAGE_PRUNE_FREEZE,
    >                 .vistest = vacrel->vistest,
    >                 .cutoffs = &vacrel->cutoffs,
    >         };
    >
    > or at a pinch, if we're really being stingy with the vertical space:
    >
    >         PruneFreezeParams params = {
    >                 .relation = rel, .buffer = buf,
    >                 .reason = PRUNE_VACUUM_SCAN, .options = HEAP_PAGE_PRUNE_FREEZE,
    >                 .vistest = vacrel->vistest, .cutoffs = &vacrel->cutoffs,
    >         };
    >
    > I had a quick grep, and every other designated struct initialiser I
    > could find uses the one-field-per-line form, but they're not consistent
    > about the comma after the last field.  I personally prefer having it, so
    > that one can add more fields later without having to modify the
    > unrelated line.
    
    pgindent doesn't allow for a space after the comma before the period.
    One reason I used struct initialization was to save space, so I'm a
    bit loath to put every member on its own line. However, I don't want
    to make the code less readable to others. So, I will commit an update
    as you request.
    
    - Melanie
    
    
    
    
  56. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Chao Li <li.evan.chao@gmail.com> — 2025-11-21T01:09:21Z

    
    > On Nov 21, 2025, at 01:19, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > 
    > On Wed, Nov 19, 2025 at 6:13 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    >> 
    >> Since it is passed into one of the helpers, I think I agree. Attached
    >> v21 has this change.
    > 
    > I've committed the first three patches. Attached v22 is the remaining
    > patches which set the VM in heap_page_prune_and_freeze() for vacuum
    > and then allow on-access pruning to also set the VM.
    > 
    
    I just started reviewing 0001 yesterday and got a few comments. However, it was late, I didn’t have enough time to wrap up, so I decided to review a few more today and send the comments together. As you have pushed 0001-0003, I’d still raise my comment for them now, and I will review the rest of commits next week.
    
    1 - pushed 0001
    ```
     			/*
     			 * Report the number of tuples reclaimed to pgstats.  This is
    @@ -419,60 +425,44 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
      * also need to account for a reduction in the length of the line pointer
      * array following array truncation by us.
      *
    - * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
    - * required in order to advance relfrozenxid / relminmxid, or if it's
    - * considered advantageous for overall system performance to do so now.  The
    - * 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
    - * are required when freezing.  When HEAP_PRUNE_FREEZE option is set, we also
    - * set presult->all_visible and presult->all_frozen on exit, to indicate if
    - * the VM bits can be set.  They are always set to false when the
    - * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
    - * that also freeze need that information.
    - *
    - * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
    - * (see heap_prune_satisfies_vacuum).
    - *
    - * options:
    - *   MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
    - *   pruning.
    + * params contains the input parameters used to control freezing and pruning
    + * behavior. See the definition of PruneFreezeParams for more on what each
    + * parameter does.
      *
    - *   FREEZE indicates that we will also freeze tuples, and will return
    - *   'all_visible', 'all_frozen' flags to the caller.
    - *
    - * cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
    - * of vacuuming the relation.  Required if HEAP_PRUNE_FREEZE option is set.
    - * cutoffs->OldestXmin is also used to determine if dead tuples are
    - * HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
    + * If the HEAP_PAGE_PRUNE_FREEZE option is set in params, we will freeze
    + * tuples if it's required in order to advance relfrozenxid / relminmxid, or
    + * if it's considered advantageous for overall system performance to do so
    + * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
    + * 'new_relmin_mxid' arguments are required when freezing.  When
    + * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
    + * and presult->all_frozen on exit, to indicate if the VM bits can be set.
    + * They are always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not
    + * passed, because at the moment only callers that also freeze need that
    + * information.
      *
      * presult contains output parameters needed by callers, such as the number of
      * tuples removed and the offsets of dead items on the page after pruning.
      * heap_page_prune_and_freeze() is responsible for initializing it.  Required
      * by all callers.
      *
    - * reason indicates why the pruning is performed.  It is included in the WAL
    - * record for debugging and analysis purposes, but otherwise has no effect.
    - *
      * off_loc is the offset location required by the caller to use in error
      * callback.
      *
      * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
    - * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
    - * multi-XID seen on the relation so far.  They will be updated with oldest
    - * values present on the page after pruning.  After processing the whole
    - * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
    - * for the relation.
    + * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
    + * oldest XID and multi-XID seen on the relation so far.  They will be updated
    + * with oldest values present on the page after pruning.  After processing the
    + * whole relation, VACUUM can use these values as the new
    + * relfrozenxid/relminmxid for the relation.
      */
     void
    -heap_page_prune_and_freeze(Relation relation, Buffer buffer,
    -						   GlobalVisState *vistest,
    -						   int options,
    -						   struct VacuumCutoffs *cutoffs,
    +heap_page_prune_and_freeze(PruneFreezeParams *params,
     						   PruneFreezeResult *presult,
    -						   PruneReason reason,
     						   OffsetNumber *off_loc,
     						   TransactionId *new_relfrozen_xid,
     						   MultiXactId *new_relmin_mxid)
     {
    ```
    
    For this function interface change, I got a concern. The old function comment says "cutoffs contains the freeze cutoffs …. Required if HEAP_PRUNE_FREEZE option is set.”, meaning that cutoffs is only useful and must be set when HEAP_PRUNE_FREEZE is set. But the new comment seems to have lost this indication.
    
    And in the old function interface, cutoffs sat right next to options, readers are easy to notice:
    
    * when options is 0, cutoffs is null
    ```
    			heap_page_prune_and_freeze(relation, buffer, vistest, 0,
    									   NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
    ```
    
    * when options has HEAP_PAGE_PRUNE_FREEZE, cutoffs is passed in
    ```
    	prune_options = HEAP_PAGE_PRUNE_FREEZE;
    	if (vacrel->nindexes == 0)
    		prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
    
    	heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
    							   &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
    							   &vacrel->offnum,
    							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
    ```
    
    So, the change doesn’t break anything, but makes code a little bit harder to read. So, my suggestion is to add an assert in heap_page_prune_and_freeze, something like:
    
    ```
    Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs != NULL);
    ```
    
    2 - pushed 0001
    ```
    +	PruneFreezeParams params = {.relation = rel,.buffer = buf,
    +		.reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
    +		.cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
    +	};
    ```
    
    Using a designated initializer is not wrong, but makes future maintenance harder, because when a new field is added, this initializer will leave the new field uninitiated. From my impression, I don’t remember I ever see a designated initializer in PG code. I only remember 3 ways I have seen:
    
    * use an initialize function to set every fields individually
    * palloc0 to set all 0, then set non-zero fields individually
    * {0} to set all 0, then set non-zero fields individually
    
    3 - pushed 0002
    ```
     					prstate->all_visible = false;
    +					prstate->all_frozen = false;
    ```
    
    Nit: Now setting the both fields to false repeat in 6 places. Maybe add a static inline function, say PruneClearVisibilityFlags(), may improve maintainability.
    
    4 - pushed 0003
    ```
    + * opporunistically freeze, to indicate if the VM bits can be set.  They are
    ```
    
    Typo: opporunistically, missed a “t”.
    
    I’d stop here today, and continue reviewing rest commits in next week.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  57. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Chao Li <li.evan.chao@gmail.com> — 2025-11-24T08:07:59Z

    
    > On Nov 21, 2025, at 09:09, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > I’d stop here today, and continue reviewing rest commits in next week.
    
    I continue reviewing today.
    
    0004 This a pure refactoring. It splits heap_page_prune_and_freeze to multiple small functions. LGTM, no comment.
    
    0005 overall good, a few nit comments as below.
    
    0006, 0007 look good, no comment.
    
    5 - 0005 - heapam.h
    ```
    +	/*
    +	 *
    +	 * vmbuffer is the buffer that must already contain contain the required
    +	 * block of the visibility map if we are to update it. blk_known_av is the
    ```
    
    Nit: 
    
    * an unnecessary empty comment line.
    * “contain contain” => “contain" 
    
    6 - 0005 heapam_xlog.c
    ```
    +		 * The critical integrity requirement here is that we must never end
    +		 * up with with the visibility map bit set and the page-level
    ```
    
    Nit: “with with” => “with”
    
    I will continue reviewing 0008 and rest tomorrow.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  58. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Chao Li <li.evan.chao@gmail.com> — 2025-11-24T09:31:31Z

    
    > On Nov 24, 2025, at 16:07, Chao Li <li.evan.chao@gmail.com> wrote:
    > 
    > 0006, 0007 look good, no comment.
    
    I missed a nit comment in 0007:
    
    7 - 0007
    ```
    + * To handle recovery conflict during logical decoding on standby, we must know
    + * if the table is a catalog table. Note that in visibilitymapdefs.h
    + * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
    + * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
    + * even if they only contain updates to the VM.
    ```
    
    VISIBLITYMAP_XLOG_CATALOG_REL missed “I” after “B”.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  59. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Andres Freund <andres@anarazel.de> — 2025-11-24T22:24:46Z

    Hi,
    
    On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
    > From 363f0e4ac9ac7699a6d9c2a267a2ad60825407c8 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Mon, 17 Nov 2025 15:11:27 -0500
    > Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
    >
    > Refactor the setup and planning phases of pruning and freezing into
    > helpers. This streamlines heap_page_prune_and_freeze() and makes it more
    > clear when the examination of tuples ends and page modifications begin.
    
    I think this is a considerable improvement.
    
    I didn't review this with a lot of detail, given that it's mostly moving
    code.
    
    One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
    argument, prune_freeze_setup() gets the entire prstate,
    heap_page_will_freeze() gets the Relation. It's what they need, but still a
    bit odd.
    
    
    FWIW, I found the diff generated by
      git show --diff-algorithm=minimal --color-moved-ws=allow-indentation-change
    
    useful for viewing this diff, showed much more clearly how little the code
    actually changed.
    
    
    
    > From 8ebaf434af5afaebcf71550116c59355b3bf15c1 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Wed, 8 Oct 2025 15:39:01 -0400
    > Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
    >  prune/freeze
    >
    > Vacuum no longer emits a separate WAL record for each page set
    > all-visible or all-frozen during phase I. Instead, visibility map
    > updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
    > is already emitted for pruning and freezing.
    >
    > Previously, heap_page_prune_and_freeze() determined whether a page was
    > all-visible, but the corresponding VM bits were only set later in
    > lazy_scan_prune(). Now the VM is updated immediately in
    > heap_page_prune_and_freeze(), at the same time as the heap
    > modifications.
    >
    > This change applies only to vacuum phase I, not to pruning performed
    > during normal page access.
    
    Hm. This change makes sense, but unfortunately I find it somewhat hard to
    review. There are a lot of changes that don't obviously work towards one
    goal in this commit.
    
    >@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
    >     Relation    relation;       /* relation containing buffer to be pruned */
    >     Buffer      buffer;         /* buffer to be pruned */
    > 
    >+    /*
    >+     *
    >+     * vmbuffer is the buffer that must already contain contain the required
    >+     * block of the visibility map if we are to update it. blk_known_av is the
    >+     * visibility status of the heap block as of the last call to
    >+     * find_next_unskippable_block().
    >+     */
    >+    Buffer      vmbuffer;
    >+    bool        blk_known_av;
    >+
    >     /*
    >      * The reason pruning was performed.  It is used to set the WAL record
    >      * opcode which is used for debugging and analysis purposes.
    
    What is blk_known_av set to if the block is known to not be all visible?
    Compared to the case where we did not yet determine the visibility status of
    the block?
    
    
    >@@ -250,8 +261,10 @@ typedef struct PruneFreezeParams
    >      * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
    >      * LP_UNUSED during pruning.
    >      *
    >-     * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
    >-     * will return 'all_visible', 'all_frozen' flags to the caller.
    >+     * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
    >+     *
    >+     * HEAP_PAGE_PRUNE_UPDATE_VIS indicates that we will set the page's status
    >+     * in the VM.
    >      */
    >     int         options;
    
    nit^2: The previous version and the other paragraphs end in a .
    
    
    > @@ -157,17 +159,36 @@ heap_xlog_prune_freeze(XLogReaderState *record)
    >  		/* There should be no more data */
    >  		Assert((char *) frz_offsets == dataptr + datalen);
    >
    > -		if (vmflags & VISIBILITYMAP_VALID_BITS)
    > -			PageSetAllVisible(page);
    > -
    > -		MarkBufferDirty(buffer);
    > +		if (do_prune || nplans > 0)
    > +			mark_buffer_dirty = set_lsn = true;
    >
    >  		/*
    > -		 * See log_heap_prune_and_freeze() for commentary on when we set the
    > -		 * heap page LSN.
    > +		 * The critical integrity requirement here is that we must never end
    > +		 * up with with the visibility map bit set and the page-level
    > +		 * PD_ALL_VISIBLE bit clear.  If that were to occur, a subsequent page
    
    s/clear/unset/ would be a tad clearer.
    
    
    > +		 * modification would fail to clear the visibility map bit.
    > +		 *
    > +		 * vmflags may be nonzero with PD_ALL_VISIBLE already set (e.g. when
    > +		 * marking an all-visible page all-frozen). If only the VM is updated,
    > +		 * the heap page need not be dirtied.
    >  		 */
    > -		if (do_prune || nplans > 0 ||
    > -			((vmflags & VISIBILITYMAP_VALID_BITS) && XLogHintBitIsNeeded()))
    > +		if ((vmflags & VISIBILITYMAP_VALID_BITS) && !PageIsAllVisible(page))
    > +		{
    > +			PageSetAllVisible(page);
    > +			mark_buffer_dirty = true;
    > +
    > +			/*
    > +			 * See log_heap_prune_and_freeze() for commentary on when we set
    > +			 * the heap page LSN.
    > +			 */
    > +			if (XLogHintBitIsNeeded())
    > +				set_lsn = true;
    > +		}
    
    Maybe worth adding something like Assert(!set_lsn || mark_buffer_dirty)?
    
    
    > +/*
    > + * Decide whether to set the visibility map bits for heap_blk, using
    > + * information from PruneState and blk_known_av. Some callers may already
    > + * have examined this page’s VM bits (e.g., VACUUM in the previous
    > + * heap_vac_scan_next_block() call) and can pass that along.
    
    That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
    to me how the state of a block where did determine that the block is *not*
    all-visible is represented.
    
    
    > + * Returns true if one or both VM bits should be set, along with the desired
    > + * flags in *vmflags. Also indicates via do_set_pd_vis whether PD_ALL_VISIBLE
    > + * should be set on the heap page.
    > + */
    > +static bool
    > +heap_page_will_set_vis(Relation relation,
    > +					   BlockNumber heap_blk,
    > +					   Buffer heap_buf,
    > +					   Buffer vmbuffer,
    > +					   bool blk_known_av,
    > +					   const PruneState *prstate,
    > +					   uint8 *vmflags,
    > +					   bool *do_set_pd_vis)
    > +{
    > +	Page		heap_page = BufferGetPage(heap_buf);
    > +	bool		do_set_vm = false;
    > +
    > +	*do_set_pd_vis = false;
    > +
    > +
    > +	/*
    > +	 * Now handle two potential corruption cases:
    > +	 *
    > +	 * These do not need to happen in a critical section and are not
    > +	 * WAL-logged.
    > +	 *
    > +	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
    > +	 * page-level bit is clear.  However, it's possible that in vacuum the bit
    > +	 * got cleared after heap_vac_scan_next_block() was called, so we must
    > +	 * recheck with buffer lock before concluding that the VM is corrupt.
    > +	 */
    > +	else if (blk_known_av && !PageIsAllVisible(heap_page) &&
    > +			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
    > +	{
    > +		ereport(WARNING,
    > +				(errcode(ERRCODE_DATA_CORRUPTED),
    > +				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
    > +						RelationGetRelationName(relation), heap_blk)));
    > +
    > +		visibilitymap_clear(relation, heap_blk, vmbuffer,
    > +							VISIBILITYMAP_VALID_BITS);
    
    Wait, why is it ok to perform this check iff blk_known_av is set?
    
    
    > +			old_vmbits = visibilitymap_set_vmbits(blockno,
    > +												  vmbuffer, new_vmbits,
    > +												  params->relation->rd_locator);
    > +			if (old_vmbits == new_vmbits)
    > +			{
    > +				LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    > +				/* Unset so we don't emit WAL since no change occurred */
    > +				do_set_vm = false;
    > +			}
    > +		}
    
    What can lead to this path being reached? Doesn't this mean that something
    changed the state of the VM while we were holding an exclusive lock on the
    heap buffer?
    
    
    > +		/*
    > +		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did. If we were
    > +		 * only updating the VM and it turns out it was already set, we will
    > +		 * have unset do_set_vm earlier. As such, check it again before
    > +		 * emitting the record.
    > +		 */
    > +		if (RelationNeedsWAL(params->relation) &&
    > +			(do_prune || do_freeze || do_set_vm))
    > +		{
    >  			log_heap_prune_and_freeze(params->relation, buffer,
    > -									  InvalidBuffer,	/* vmbuffer */
    > -									  0,	/* vmflags */
    > +									  do_set_vm ? vmbuffer : InvalidBuffer,
    > +									  do_set_vm ? new_vmbits : 0,
    >  									  conflict_xid,
    > -									  true, params->reason,
    > +									  true, /* cleanup lock */
    > +									  do_set_pd_vis,
    > +									  params->reason,
    >  									  prstate.frozen, prstate.nfrozen,
    >  									  prstate.redirected, prstate.nredirected,
    >  									  prstate.nowdead, prstate.ndead,
    
    This function is now taking 16 parameters :/
    
    
    > @@ -959,28 +1148,47 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
    >
    >  	END_CRIT_SECTION();
    >
    > +	if (do_set_vm)
    > +		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    > +
    > +	/*
    > +	 * During its second pass over the heap, VACUUM calls
    > +	 * heap_page_would_be_all_visible() to determine whether a page is
    > +	 * all-visible and all-frozen. The logic here is similar. After completing
    > +	 * pruning and freezing, use an assertion to verify that our results
    > +	 * remain consistent with heap_page_would_be_all_visible().
    > +	 */
    > +#ifdef USE_ASSERT_CHECKING
    > +	if (prstate.all_visible)
    > +	{
    > +		TransactionId debug_cutoff;
    > +		bool		debug_all_frozen;
    > +
    > +		Assert(prstate.lpdead_items == 0);
    > +		Assert(prstate.cutoffs);
    > +
    > +		if (!heap_page_is_all_visible(params->relation, buffer,
    > +									  prstate.cutoffs->OldestXmin,
    > +									  &debug_all_frozen,
    > +									  &debug_cutoff, off_loc))
    > +			Assert(false);
    
    I don't love Assert(false), because the message for the assert failure is
    pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
    () that has no body other than Assert(false)? Just Assert the expression
    directly.
    
    
    > From 34f0009570e117d7d48b560cd097ee25c6cdcc7c Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Sat, 27 Sep 2025 11:55:21 -0400
    > Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
    >
    > As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
    > marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
    
    This whole business of treating empty pages as all-visible continues to not
    make any sense to me. Particularly in combination with a not crashsafe FSM it
    just seems ... unhelpful. It also means that there there's a decent chance of
    extra WAL when bulk extending. But that's not the fault of this change.
    
    
    > From 0d6a06d4533cfe153440d301c3d20915ba07892f Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Sat, 27 Sep 2025 11:55:36 -0400
    > Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
    >
    > As no remaining users emit XLOG_HEAP2_VISIBLE records.
    > This includes deleting the xl_heap_visible struct and all functions
    > responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
    
    Probably worth mentioning that this changes the VM API.
    
    
    > @@ -2396,14 +2396,18 @@ get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
    >   *
    >   * This is used for several different page maintenance operations:
    >   *
    > - * - Page pruning, in VACUUM's 1st pass or on access: Some items are
    > + * - Page pruning, in vacuum phase I or on-access: Some items are
    >   *   redirected, some marked dead, and some removed altogether.
    >   *
    > - * - Freezing: Items are marked as 'frozen'.
    > + * - Freezing: During vacuum phase I, items are marked as 'frozen'
    >   *
    > - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
    > + * - Reaping: During vacuum phase III, items that are already LP_DEAD are
    > + *   marked as unused.
    >   *
    > - * They have enough commonalities that we use a single WAL record for them
    > + * - VM updates: After vacuum phases I and III, the heap page may be marked
    > + *   all-visible and all-frozen.
    > + *
    > + * These changes all happen together, so we use a single WAL record for them
    >   * all.
    >   *
    >   * If replaying the record requires a cleanup lock, pass cleanup_lock =
    >   true.
    
    How's that related to the commit's subject?
    
    
    > From fd0455230968fd919999a5c035f3830d310f0e49 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Fri, 18 Jul 2025 16:30:04 -0400
    > Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
    >  GlobalVisXidVisibleToAll()
    > MIME-Version: 1.0
    > Content-Type: text/plain; charset=UTF-8
    > Content-Transfer-Encoding: 8bit
    >
    > The function is currently only used to check whether a tuple’s xmax is
    > visible to all transactions (and thus removable). Upcoming changes will
    > also use it to test whether a tuple’s xmin is visible to all to
    > decide if a page can be marked all-visible in the visibility map.
    >
    > The new name, GlobalVisXidVisibleToAll(), better reflects this broader
    > purpose.
    
    If we want this - and I'm not convinced we do - I think it needs to go further
    and change the other uses of removable in
    procarray.c. ComputeXidHorizonsResult has a lot of related fields.
    
    There's also GetOldestNonRemovableTransactionId(),
    GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
    included in the renaming.
    
    
    > From 565014e31aa117fb43993ee2e64da38ffb74f372 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Tue, 29 Jul 2025 14:38:24 -0400
    > Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
    >  visibility
    > MIME-Version: 1.0
    > Content-Type: text/plain; charset=UTF-8
    > Content-Transfer-Encoding: 8bit
    >
    > During vacuum's first and third phases, we examine tuples' visibility
    > to determine if we can set the page all-visible in the visibility map.
    >
    > Previously, this check compared tuple xmins against a single XID chosen at
    > the start of vacuum (OldestXmin). We now use GlobalVisState, which also
    > enables future work to set the VM during on-access pruning, since ordinary
    > queries have access to GlobalVisState but not OldestXmin.
    >
    > This also benefits vacuum directly: in some cases, GlobalVisState may
    > advance during a vacuum, allowing more pages to become considered
    > all-visible. And, in the future, we could easily add a heuristic to
    > update GlobalVisState more frequently during vacuums of large tables. In
    > the rare case that the GlobalVisState moves backward, vacuum falls back
    > to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
    > wasn’t yet prunable according to the GlobalVisState.
    
    I think it may be better to make sure that the GlobalVisState can't move
    backward.
    
    
    > From bced81f6df3d303679fac2a1414d42f0db401232 Mon Sep 17 00:00:00 2001
    > From: Melanie Plageman <melanieplageman@gmail.com>
    > Date: Tue, 29 Jul 2025 14:34:30 -0400
    > Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
    >
    > Many queries do not modify the underlying relation. For such queries, if
    > on-access pruning occurs during the scan, we can check whether the page
    > has become all-visible and update the visibility map accordingly.
    > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
    > all-frozen.
    
    > Supporting this requires passing information about whether the relation
    > is modified from the executor down to the scan descriptor.
    
    I think it'd be good to split this part into a separate commit. The set of
    folks to review that are distinct (and broader) from the ones looking at
    heapam internals.
    
    
    Greetings,
    
    Andres Freund
    
    
    
    
  60. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-11-25T21:43:58Z

    Thanks for the review!
    
    On Thu, Nov 20, 2025 at 8:10 PM Chao Li <li.evan.chao@gmail.com> wrote:
    >
    >   * new_relfrozen_xid and new_relmin_mxid must provided by the caller if the
    > - * HEAP_PRUNE_FREEZE option is set.  On entry, they contain the oldest XID and
    > - * multi-XID seen on the relation so far.  They will be updated with oldest
    > - * values present on the page after pruning.  After processing the whole
    > - * relation, VACUUM can use these values as the new relfrozenxid/relminmxid
    > - * for the relation.
    > + * HEAP_PAGE_PRUNE_FREEZE option is set in params.  On entry, they contain the
    > + * oldest XID and multi-XID seen on the relation so far.  They will be updated
    > + * with oldest values present on the page after pruning.  After processing the
    > + * whole relation, VACUUM can use these values as the new
    > + * relfrozenxid/relminmxid for the relation.
    >   */
    >  void
    > -heap_page_prune_and_freeze(Relation relation, Buffer buffer,
    > -                                                  GlobalVisState *vistest,
    > -                                                  int options,
    > -                                                  struct VacuumCutoffs *cutoffs,
    > +heap_page_prune_and_freeze(PruneFreezeParams *params,
    >                                                    PruneFreezeResult *presult,
    > -                                                  PruneReason reason,
    >                                                    OffsetNumber *off_loc,
    >                                                    TransactionId *new_relfrozen_xid,
    >                                                    MultiXactId *new_relmin_mxid)
    >  {
    > ```
    >
    > For this function interface change, I got a concern. The old function comment says "cutoffs contains the freeze cutoffs …. Required if HEAP_PRUNE_FREEZE option is set.”, meaning that cutoffs is only useful and must be set when HEAP_PRUNE_FREEZE is set. But the new comment seems to have lost this indication.
    
    I did move that comment into the PruneFreezeParams struct definition.
    
    > And in the old function interface, cutoffs sat right next to options, readers are easy to notice:
    >
    > * when options is 0, cutoffs is null
    > ```
    >                         heap_page_prune_and_freeze(relation, buffer, vistest, 0,
    >                                                                            NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
    > ```
    >
    > * when options has HEAP_PAGE_PRUNE_FREEZE, cutoffs is passed in
    > ```
    >         prune_options = HEAP_PAGE_PRUNE_FREEZE;
    >         if (vacrel->nindexes == 0)
    >                 prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
    >
    >         heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
    >                                                            &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
    >                                                            &vacrel->offnum,
    >                                                            &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
    > ```
    >
    > So, the change doesn’t break anything, but makes code a little bit harder to read. So, my suggestion is to add an assert in heap_page_prune_and_freeze, something like:
    >
    > Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs != NULL);
    
    That's fair. I've gone ahead and pushed a commit with your suggested assert.
    
    > 2 - pushed 0001
    > ```
    > +       PruneFreezeParams params = {.relation = rel,.buffer = buf,
    > +               .reason = PRUNE_VACUUM_SCAN,.options = HEAP_PAGE_PRUNE_FREEZE,
    > +               .cutoffs = &vacrel->cutoffs,.vistest = vacrel->vistest
    > +       };
    > ```
    >
    > Using a designated initializer is not wrong, but makes future maintenance harder, because when a new field is added, this initializer will leave the new field uninitiated. From my impression, I don’t remember I ever see a designated initializer in PG code. I only remember 3 ways I have seen:
    >
    > * use an initialize function to set every fields individually
    > * palloc0 to set all 0, then set non-zero fields individually
    > * {0} to set all 0, then set non-zero fields individually
    
    Well, the main reason you don't see them much in the code is that a
    lot of the code is old and we didn't require a c99-compliant compiler
    until fairly recently (okay like 2018/2019) -- and thus couldn't use
    designated initializers.
    
    I agree that they are rare for structs (they are quite commonly used
    with arrays), but they are there -- for example these bufmgr init
    macros
    
    #define BMR_REL(p_rel) \
        ((BufferManagerRelation){.rel = p_rel})
    #define BMR_SMGR(p_smgr, p_relpersistence) \
        ((BufferManagerRelation){.smgr = p_smgr, .relpersistence =
    p_relpersistence})
    #define BMR_GET_SMGR(bmr) \
        (RelationIsValid((bmr).rel) ? RelationGetSmgr((bmr).rel) : (bmr).smgr)
    
    I don't see how it would be harder to remember to initialize a field
    with a designated initializer vs if you have to just remember to add a
    line initializing that field in the code. And using a designated
    initializer ensures all unspecified fields will be zeroed out.
    
    In general, I have seen threads [1] encouraging the use of designated
    initializers, so I'm inclined to leave it as is since it is committed,
    and I haven't heard other pushback.
    
    > 3 - pushed 0002
    > ```
    >                                         prstate->all_visible = false;
    > +                                       prstate->all_frozen = false;
    > ```
    >
    > Nit: Now setting the both fields to false repeat in 6 places. Maybe add a static inline function, say PruneClearVisibilityFlags(), may improve maintainability.
    
    I see your point. However, I don't think it would necessarily be an
    improvement. This function already has a lot of helpers you have to
    jump to to understand what's going on. And in the place where they are
    cleared most often, heap_prune_record_unchanged_lp_normal(), we set
    other fields of the prstate directly, so it is nice visual symmetry in
    my opinion to set them inline.
    
    I did want to use chained assignment (all_visible = all_frozen =
    false), but I have had people complain about that in my code before
    more than once, so I refrained.
    
    > 4 - pushed 0003
    > ```
    > + * opporunistically freeze, to indicate if the VM bits can be set.  They are
    > ```
    >
    > Typo: opporunistically, missed a “t”.
    
    Fixed in same commit that added the assert.
    
    - Melanie
    
    [1] https://www.postgresql.org/message-id/flat/5B873BED.9080501%40anastigmatix.net#4a067c1314783f0e171b4e1be76f7574
    
    
    
    
  61. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-03T23:07:38Z

    Thanks for the review! All the small changes you suggested I made in
    attached v23 unless otherwise noted below.
    
    On Mon, Nov 24, 2025 at 5:24 PM Andres Freund <andres@anarazel.de> wrote:
    >
    > On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
    > > Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
    >
    > One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
    > argument, prune_freeze_setup() gets the entire prstate,
    > heap_page_will_freeze() gets the Relation. It's what they need, but still a
    > bit odd.
    
    They all get the PruneState actually.
    
    I've committed this patch (but actually have to do a follow-on commit
    to silence coverity. Will do that next.)
    
    > > Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
    > >  prune/freeze
    >
    >
    > Hm. This change makes sense, but unfortunately I find it somewhat hard to
    > review. There are a lot of changes that don't obviously work towards one
    > goal in this commit.
    
    I've split up the first commit into 4 patches in attached v23
    (0002-0005). They are not meant to be committed separately but are
    separate only for ease of review. They comprise the logical steps for
    getting to the final code state. I originally had it split up but got
    feedback it was more work to review them each. So, let's see how this
    goes.
    
    > >@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
    >
    > >+     * vmbuffer is the buffer that must already contain contain the required
    > >+     * block of the visibility map if we are to update it. blk_known_av is the
    > >+     * visibility status of the heap block as of the last call to
    > >+     * find_next_unskippable_block().
    > >+     */
    > >+    Buffer      vmbuffer;
    > >+    bool        blk_known_av;
    >
    > What is blk_known_av set to if the block is known to not be all visible?
    > Compared to the case where we did not yet determine the visibility status of
    > the block?
    
    blk_known_av should always be set to false if the caller doesn't know.
    It is used as an optimization. I've added to the comment in this
    struct to clarify that. More on this further down in my mail.
    
    > > + * Decide whether to set the visibility map bits for heap_blk, using
    > > + * information from PruneState and blk_known_av. Some callers may already
    > > + * have examined this page’s VM bits (e.g., VACUUM in the previous
    > > + * heap_vac_scan_next_block() call) and can pass that along.
    >
    > That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
    > to me how the state of a block where did determine that the block is *not*
    > all-visible is represented.
    
    There is no need to distinguish between knowing it is not all-visible
    and not knowing if it is all-visible. That is, "not known" and "known
    not" are the same for our purposes. This is only an optimization and
    not needed for correctness. I've tried to add comments to this effect
    in various places where blk_known_av is used.
    
    > > +     else if (blk_known_av && !PageIsAllVisible(heap_page) &&
    > > +                      visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
    > > +     {
    > > +             ereport(WARNING,
    > > +                             (errcode(ERRCODE_DATA_CORRUPTED),
    > > +                              errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
    > > +                                             RelationGetRelationName(relation), heap_blk)));
    > > +
    > > +             visibilitymap_clear(relation, heap_blk, vmbuffer,
    > > +                                                     VISIBILITYMAP_VALID_BITS);
    >
    > Wait, why is it ok to perform this check iff blk_known_av is set?
    
    This is existing logic in vacuum. It would be okay to perform the
    check even if blk_known_av is false but might be too expensive for the
    common case where the page is not all-visible (especially on-access).
    The next vacuum should be able to enter this code path and fix it. Or
    do you think it will be cheap enough because the caller will have read
    in and pinned the VM page?
    
    > > +                     old_vmbits = visibilitymap_set_vmbits(blockno,
    > > +                                                                                               vmbuffer, new_vmbits,
    > > +                                                                                               params->relation->rd_locator);
    > > +                     if (old_vmbits == new_vmbits)
    > > +                     {
    > > +                             LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
    > > +                             /* Unset so we don't emit WAL since no change occurred */
    > > +                             do_set_vm = false;
    > > +                     }
    > > +             }
    >
    > What can lead to this path being reached? Doesn't this mean that something
    > changed the state of the VM while we were holding an exclusive lock on the
    > heap buffer?
    
    This shouldn't be in this commit (I've fixed that). However, it is
    needed once we have on-access VM setting because we could have set the
    page all-visible in the VM on-access in between when
    find_next_unskippable_block() first checks the VM and sets
    all_visible_according_to_vm/blk_known_av and when we take the lock and
    prune/freeze the page.
    
    > >                       log_heap_prune_and_freeze(params->relation, buffer,
    > > -                                                                       InvalidBuffer,        /* vmbuffer */
    > > -                                                                       0,    /* vmflags */
    > > +                                                                       do_set_vm ? vmbuffer : InvalidBuffer,
    > > +                                                                       do_set_vm ? new_vmbits : 0,
    > >                                                                         conflict_xid,
    > > -                                                                       true, params->reason,
    > > +                                                                       true, /* cleanup lock */
    > > +                                                                       do_set_pd_vis,
    > > +                                                                       params->reason,
    > >                                                                         prstate.frozen, prstate.nfrozen,
    > >                                                                         prstate.redirected, prstate.nredirected,
    > >                                                                         prstate.nowdead, prstate.ndead,
    >
    > This function is now taking 16 parameters :/
    
    Is this complaint about readability or performance of parameter
    passing? Because if it's the latter, I can't imagine that will be
    noticeable when compared to the overhead of emitting a WAL record.
    
    I could add a struct just for passing the parameters to the
    log_heap_prune_and_freeze(). Something like:
    
    typedef struct PruneFreezeChanges
    {
        int            nredirected;
        int            ndead;
        int            nunused;
        int            nfrozen;
        OffsetNumber *redirected;
        OffsetNumber *nowdead;
        OffsetNumber *nowunused;
        HeapTupleFreeze *frozen;
    } PruneFreezeChanges;
    
    PruneFreezeChanges c = {
            .redirected = prstate.redirected,
            .nredirected = prstate.nredirected,
            .ndead = prstate.ndead,
            .nowdead = prstate.nowdead,
            .nunused = prstate.nunused,
            .nowunused = prstate.nowunused,
            .nfrozen = prstate.nfrozen,
            .frozen = prstate.frozen,
    };
    
    log_heap_prune_and_freeze(params->relation, buffer,
                                                            InvalidBuffer,
       /* vmbuffer */
                                                            0,    /* vmflags */
                                                            conflict_xid,
                                                            true, params->reason,
                                                            c);
    
    However, I fear it is a bit confusing to have this struct just to pass
    the parameters to the log_heap_prune_and_freeze(). We can't use that
    struct inline in the PruneState because then we would need all the
    arrays to be inline in the PruneFreezeChanges struct which would cause
    4*MaxHeapTuplesPerPage stack allocated OffsetNumbers in vacuum phase
    III than it currently has and needs.
    
    The only other related parameters I see that could be stuck into a
    struct are vmflags and set_pd_all_vis -- maybe called VisiChanges or
    HeapPageVisiChanges. But again, I'm not sure if it is worth adding a
    new struct for this.
    
    > > +#ifdef USE_ASSERT_CHECKING
    > > +     if (prstate.all_visible)
    > > +     {
    > > +             TransactionId debug_cutoff;
    > > +             bool            debug_all_frozen;
    > > +
    > > +             Assert(prstate.lpdead_items == 0);
    > > +             Assert(prstate.cutoffs);
    > > +
    > > +             if (!heap_page_is_all_visible(params->relation, buffer,
    > > +                                                                       prstate.cutoffs->OldestXmin,
    > > +                                                                       &debug_all_frozen,
    > > +                                                                       &debug_cutoff, off_loc))
    > > +                     Assert(false);
    >
    > I don't love Assert(false), because the message for the assert failure is
    > pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
    > () that has no body other than Assert(false)? Just Assert the expression
    > directly.
    
    This is existing code. I agree it's weird, but I remember Peter saying
    something about why he did it this way that I no longer remember.
    Anyway, 0001 changes the assert as you suggest.
    
    > > Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
    > >
    > > As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
    > > marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
    >
    > This whole business of treating empty pages as all-visible continues to not
    > make any sense to me. Particularly in combination with a not crashsafe FSM it
    > just seems ... unhelpful. It also means that there there's a decent chance of
    > extra WAL when bulk extending. But that's not the fault of this change.
    
    Is the argument for setting them av/af that we can skip them more
    easily in future vacuums (i.e. not have to read in the page and take a
    lock etc)?
    
    > > Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
    > >
    > > As no remaining users emit XLOG_HEAP2_VISIBLE records.
    > > This includes deleting the xl_heap_visible struct and all functions
    > > responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
    >
    > Probably worth mentioning that this changes the VM API.
    
    I've added a mention about this in the commit.
    Are you imagining I have any comments anywhere about how
    XLOG_HEAP2_VISIBLE used to exist?
    
    I realized I need to bump XLOG_PAGE_MAGIC in this commit because the
    code to replay XLOG_HEAP2_VISIBLE records is gone now.
    
    What I'm not sure is if I have to bump it in some of the other commits
    that change which WAL records are emitted by a particular operation
    (e.g. not emitting a separate VM record from phase I of vacuum).
    
    > > - * - Page pruning, in VACUUM's 1st pass or on access: Some items are
    > > + * - Page pruning, in vacuum phase I or on-access: Some items are
    > >   *   redirected, some marked dead, and some removed altogether.
    > >   *
    > > - * - Freezing: Items are marked as 'frozen'.
    > > + * - Freezing: During vacuum phase I, items are marked as 'frozen'
    > >   *
    > > - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
    > > + * - Reaping: During vacuum phase III, items that are already LP_DEAD are
    > > + *   marked as unused.
    > >   *
    > > - * They have enough commonalities that we use a single WAL record for them
    > > + * - VM updates: After vacuum phases I and III, the heap page may be marked
    > > + *   all-visible and all-frozen.
    > > + *
    > > + * These changes all happen together, so we use a single WAL record for them
    > >   * all.
    > >   *
    > >   * If replaying the record requires a cleanup lock, pass cleanup_lock =
    > >   true.
    >
    > How's that related to the commit's subject?
    
    Oops, I moved it to the relevant commit.
    
    > > Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
    > >  GlobalVisXidVisibleToAll()
    > >
    > > The function is currently only used to check whether a tuple’s xmax is
    > > visible to all transactions (and thus removable). Upcoming changes will
    > > also use it to test whether a tuple’s xmin is visible to all to
    > > decide if a page can be marked all-visible in the visibility map.
    > >
    > > The new name, GlobalVisXidVisibleToAll(), better reflects this broader
    > > purpose.
    >
    > If we want this - and I'm not convinced we do - I think it needs to go further
    > and change the other uses of removable in
    > procarray.c. ComputeXidHorizonsResult has a lot of related fields.
    >
    > There's also GetOldestNonRemovableTransactionId(),
    > GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
    > included in the renaming.
    
    Okay, I see what you are saying. When you say you're not sure if we
    want "this" -- do you mean using GlobalVisState for determining if
    xmins are visible to all (which is required to set the VM on-access)
    or do you mean renaming those functions?
    
    If we're just talking about the renaming, looking at procarray.c, it
    is full of the word "removable" because its functions were largely
    used to examine and determine if everyone can see an xmax as committed
    and thus if that tuple is removable from their perspective. But
    nothing about the code that I can see means it has to be an xmax. We
    could just as well use the functions to determine if everyone can see
    an xmin as committed.
    
    I don't see how we can leave the names as is and use it on xmins
    because that tuple is _not_ removable based on testing if everyone can
    see the xmin. So the function basically returns an incorrect result.
    
    That being said, the problem with replacing "removable" with "visible
    to all" -- which isn't _terrible_ -- means we have to replace
    "nonremovable" with "not visible to all" -- which is terrible.
    
    I think getting rid of "removable" from procarray.c would be an
    improvement because that file feels tightly coupled to vacuum and
    removing tuples because of the names of variables and functions when
    actually its functionality isn't. So, the issue is coming up with
    something palatable.
    
    One alternative idea (that requires no renaming) is to add a wrapper
    function somewhere outside procarray.c which invokes
    GlobalVisTestIsRemovableXid() but is called something like
    XidVisibleToAll() and is documented for use with xmins/etc. It would
    avoid the messy work of coming up with a good name. What do you think?
    
    > > Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
    > >  visibility
    > >
    > > This also benefits vacuum directly: in some cases, GlobalVisState may
    > > advance during a vacuum, allowing more pages to become considered
    > > all-visible. And, in the future, we could easily add a heuristic to
    > > update GlobalVisState more frequently during vacuums of large tables. In
    > > the rare case that the GlobalVisState moves backward, vacuum falls back
    > > to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
    > > wasn’t yet prunable according to the GlobalVisState.
    >
    > I think it may be better to make sure that the GlobalVisState can't move
    > backward.
    
    Do you mean that I shouldn't use the GlobalVisState to determine
    visibility until I make sure it can't move backwards?
    
    There is actually no functional difference in my patch set with the
    code this commit message refers to (in heap_prune_satisfies_vacuum()).
    I only mentioned it to make sure folks knew that even though I was
    widening usage of GlobalVisState, we wouldn't encounter
    synchronization issues with freezing horizons.
    
    > > Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
    > >
    > > Many queries do not modify the underlying relation. For such queries, if
    > > on-access pruning occurs during the scan, we can check whether the page
    > > has become all-visible and update the visibility map accordingly.
    > > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
    > > all-frozen.
    >
    > > Supporting this requires passing information about whether the relation
    > > is modified from the executor down to the scan descriptor.
    >
    > I think it'd be good to split this part into a separate commit. The set of
    > folks to review that are distinct (and broader) from the ones looking at
    > heapam internals.
    
    Good point. I've split it into 3 commits in this patch set (0011-0013)
    
    - Melanie
    
  62. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-03T23:08:53Z

    On Mon, Nov 24, 2025 at 3:08 AM Chao Li <li.evan.chao@gmail.com> wrote:
    >
    > > On Nov 21, 2025, at 09:09, Chao Li <li.evan.chao@gmail.com> wrote:
    > >
    > > I’d stop here today, and continue reviewing rest commits in next week.
    >
    > I continue reviewing today.
    
    I incorporated all your feedback in my recently posted v23. Thanks for
    the review!
    
    - Melanie
    
    
    
    
  63. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Chao Li <li.evan.chao@gmail.com> — 2025-12-04T05:10:33Z

    Hi Melanie,
    
    I resisted this patch again today. I reviewed 0001-0004, and got a few more comments:
    
    > On Dec 4, 2025, at 07:07, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > 
    > <v23-0001-Simplify-vacuum-visibility-assertion.patch><v23-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch><v23-0003-Set-the-VM-in-prune-code.patch><v23-0004-Move-VM-assert-into-prune-freeze-code.patch><v23-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v23-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v23-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v23-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch><v23-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v23-0010-Unset-all_visible-sooner-if-not-freezing.patch><v23-0011-Track-which-relations-are-modified-by-a-query.patch><v23-0012-Pass-down-information-on-table-modification-to-s.patch><v23-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch><v23-0014-Set-pd_prune_xid-on-insert.patch>
    
    1 - 0002
    ```
    +static bool
    +heap_page_will_set_vis(Relation relation,
    +					   BlockNumber heap_blk,
    +					   Buffer heap_buf,
    +					   Buffer vmbuffer,
    +					   bool all_visible_according_to_vm,
    +					   const PruneFreezeResult *presult,
    +					   uint8 *new_vmbits,
    +					   bool *do_set_pd_vis)
    ```
    
    Actually, I wanted to comment on the new function name in last round of review, but I guess I missed that.
    
    I was very confused what “set_vis” means, and finally figured out “vis” should stand for “visibility”. Here “vis” actually means “visibility map bits”. There is the other “vis” in the last parameter’s name “do_set_pd_vis” where the “vis” should be mean “PD_ALL_VISIBLE” bit. So the two “vis” feels making things confusing.
    
    How about rename the function to “heap_page_will_set_vm_bits”, and rename the last parameter to “do_set_all_visible”? 
    
    2 - 0002
    ```
    + * Decide whether to set the visibility map bits for heap_blk, using
    + * information from PruneFreezeResult and all_visible_according_to_vm. This
    + * function does not actually set the VM bit or page-level hint,
    + * PD_ALL_VISIBLE.
    + *
    + * If it finds that the page-level visibility hint or VM is corrupted, it will
    + * fix them by clearing the VM bit and page hint. This does not need to be
    + * done in a critical section.
    + *
    + * Returns true if one or both VM bits should be set, along with the desired
    + * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
    + * PD_ALL_VISIBLE should be set on the heap page.
    + */
    ```
    
    This function comment mentions PD_ALL_VISIBLE twice, but never mentions ALL_FROZEN. So “Returns true if one or both VM bits should be set” fells unclear. How about rephrase like "Returns true if the all-visible and/or all-frozen VM bits should be set.”
    
    3 - 0002
    ```
    +	/*
    +	 * Now handle two potential corruption cases:
    +	 *
    +	 * These do not need to happen in a critical section and are not
    +	 * WAL-logged.
    +	 *
    +	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
    +	 * page-level bit is clear.  However, it's possible that the bit got
    +	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
    +	 * with buffer lock before concluding that the VM is corrupt.
    +	 */
    +	else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
    +			 visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
    +	{
    +		ereport(WARNING,
    +				(errcode(ERRCODE_DATA_CORRUPTED),
    +				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
    +						RelationGetRelationName(relation), heap_blk)));
    +
    +		visibilitymap_clear(relation, heap_blk, vmbuffer,
    +							VISIBILITYMAP_VALID_BITS);
    +	}
    ```
    
    Here in the comment and error message, I guess “visibility map bit” refers to “all visible bit”, can we be explicit?
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  64. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-09T17:48:45Z

    On Thu, Dec 4, 2025 at 12:11 AM Chao Li <li.evan.chao@gmail.com> wrote:
    >
    > I resisted this patch again today. I reviewed 0001-0004, and got a few more comments:
    
    Thanks for the review! v24 attached with updates you suggested as well
    as the bug fix described below.
    
    I realized my code didn't mark the heap buffer dirty if we were not
    modifying it (i.e. only setting the VM). This trips an assert in
    XLogRegisterBuffer() which requires that all buffers registered with
    the WAL machinery are marked dirty unless REGBUF_NO_CHANGE is passed.
    
    It wasn't possible to hit it in master because we unconditionally
    dirtied the buffer if we found the VM not set in
    find_next_unskippable_block() -- even if we made no changes to the
    heap buffer. But my refactoring only dirtied the heap buffer if we
    modified it (either pruning/freezing or setting PD_ALL_VISIBLE).
    
    In attached v24, I once again always dirty the heap buffer before
    registering it. We can't skip adding the heap buffer to the WAL chain
    even if we didn't modify it, because we use it to update the freespace
    map during recovery. We could pass REGBUF_NO_CHANGE when the heap
    buffer is completely unmodified. But the delicate special case logic
    doesn't seem worth the effort to maintain, as the only time the heap
    buffer should be unmodified is when the VM has been truncated or
    removed. I added a test to the commit doing this refactoring that
    would have caught my mistake (0003).
    
    I also split the refactoring of the VM setting logic into more commits
    to help make it clearer (0003-0004). We could technically commit the
    refactoring commits to master. I had not originally intended to do so
    since they do not have independent value beyond clarity for the
    reviewer.
    
    In this set 0001 and 0002 are independent. 0003-0007 are all small
    steps toward the single change in 0007 which combines the VM updates
    into the same WAL record as pruning and freezing. 0008 and 0009 are
    removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
    needed to set the VM during on-access pruning. 0013 - 0015 are small
    steps toward setting the VM on-access. And 0016 sets the prune xid on
    insert so we may set the VM on-access for pages that have only new
    data.
    
    > +static bool
    > +heap_page_will_set_vis(Relation relation,
    >
    > Actually, I wanted to comment on the new function name in last round of review, but I guess I missed that.
    >
    > I was very confused what “set_vis” means, and finally figured out “vis” should stand for “visibility”. Here “vis” actually means “visibility map bits”. There is the other “vis” in the last parameter’s name “do_set_pd_vis” where the “vis” should be mean “PD_ALL_VISIBLE” bit. So the two “vis” feels making things confusing.
    >
    > How about rename the function to “heap_page_will_set_vm_bits”, and rename the last parameter to “do_set_all_visible”?
    
    I named it that way because it was responsible for telling us what we
    should set the VM to _and_ if we should set PD_ALL_VISIBLE. However,
    once I corrected the bug mentioned above, we always set PD_ALL_VISIBLE
    if setting the VM, so I was able to remove this ambiguity. As such
    I've renamed the function heap_page_will_set_vm() (and removed the
    last parameter).
    
    > + * Decide whether to set the visibility map bits for heap_blk, using
    > + * information from PruneFreezeResult and all_visible_according_to_vm. This
    > + * function does not actually set the VM bit or page-level hint,
    > + * PD_ALL_VISIBLE.
    > + *
    > + * If it finds that the page-level visibility hint or VM is corrupted, it will
    > + * fix them by clearing the VM bit and page hint. This does not need to be
    > + * done in a critical section.
    > + *
    > + * Returns true if one or both VM bits should be set, along with the desired
    > + * flags in *new_vmbits. Also indicates via do_set_pd_vis whether
    > + * PD_ALL_VISIBLE should be set on the heap page.
    > + */
    > ```
    >
    > This function comment mentions PD_ALL_VISIBLE twice, but never mentions ALL_FROZEN. So “Returns true if one or both VM bits should be set” fells unclear. How about rephrase like "Returns true if the all-visible and/or all-frozen VM bits should be set.”
    
    PD_ALL_VISIBLE is the page-level visibility hint (not the VM bit) and
    there is no page level frozen hint. It doesn't mention that the VM
    bits are all-visible and all-frozen, though, so I have modified the
    comment a bit to make sure the all-frozen bit of the VM is mentioned.
    
    > +        * Now handle two potential corruption cases:
    > +        *
    > +        * These do not need to happen in a critical section and are not
    > +        * WAL-logged.
    > +        *
    > +        * As of PostgreSQL 9.2, the visibility map bit should never be set if the
    > +        * page-level bit is clear.  However, it's possible that the bit got
    > +        * cleared after heap_vac_scan_next_block() was called, so we must recheck
    > +        * with buffer lock before concluding that the VM is corrupt.
    > +        */
    > +       else if (all_visible_according_to_vm && !PageIsAllVisible(heap_page) &&
    > +                        visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
    > +       {
    > +               ereport(WARNING,
    > +                               (errcode(ERRCODE_DATA_CORRUPTED),
    > +                                errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
    > +                                               RelationGetRelationName(relation), heap_blk)));
    > +
    > +               visibilitymap_clear(relation, heap_blk, vmbuffer,
    > +                                                       VISIBILITYMAP_VALID_BITS);
    > +       }
    > ```
    >
    > Here in the comment and error message, I guess “visibility map bit” refers to “all visible bit”, can we be explicit?
    
    This is an existing comment in lazy_scan_prune() that I simply moved.
    It isn't valid for the all-frozen bit to be set unless the all-visible
    bit is set. I'm not sure whether specifying which bits were set in the
    warning will help users debug the corruption they are seeing. But I
    think it is a reasonable suggestion to make. Perhaps it is worth
    suggesting this (adding the specific vmbits to the warning message) in
    a separate thread since it is an independent improvement on master?
    
    - Melanie
    
  65. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-10T23:35:47Z

    On Tue, Dec 9, 2025 at 12:48 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > In this set 0001 and 0002 are independent. 0003-0007 are all small
    > steps toward the single change in 0007 which combines the VM updates
    > into the same WAL record as pruning and freezing. 0008 and 0009 are
    > removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
    > needed to set the VM during on-access pruning. 0013 - 0015 are small
    > steps toward setting the VM on-access. And 0016 sets the prune xid on
    > insert so we may set the VM on-access for pages that have only new
    > data.
    
    I committed 0001 and 0002. attached v25 reflects that.
    0001-0004 refactoring steps for eliminate visible record from phase I
    (not probably independent commits in the end)
    0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac
    0006-0007 removing the rest of XLOG_HEAP2_VISIBLE
    0008-0010 refactoring for setting VM on-access
    0011-0013 setting the VM on-access
    0014 - setting pd_prune_xid on insert
    
    - Melanie
    
  66. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Chao Li <li.evan.chao@gmail.com> — 2025-12-11T04:06:57Z

    
    > On Dec 11, 2025, at 07:35, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > 
    > On Tue, Dec 9, 2025 at 12:48 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    >> 
    >> In this set 0001 and 0002 are independent. 0003-0007 are all small
    >> steps toward the single change in 0007 which combines the VM updates
    >> into the same WAL record as pruning and freezing. 0008 and 0009 are
    >> removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
    >> needed to set the VM during on-access pruning. 0013 - 0015 are small
    >> steps toward setting the VM on-access. And 0016 sets the prune xid on
    >> insert so we may set the VM on-access for pages that have only new
    >> data.
    > 
    > I committed 0001 and 0002. attached v25 reflects that.
    > 0001-0004 refactoring steps for eliminate visible record from phase I
    > (not probably independent commits in the end)
    > 0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac
    > 0006-0007 removing the rest of XLOG_HEAP2_VISIBLE
    > 0008-0010 refactoring for setting VM on-access
    > 0011-0013 setting the VM on-access
    > 0014 - setting pd_prune_xid on insert
    > 
    > - Melanie
    > <v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch><v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patch><v25-0004-Move-VM-assert-into-prune-freeze-code.patch><v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch><v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v25-0011-Track-which-relations-are-modified-by-a-query.patch><v25-0012-Pass-down-information-on-table-modification-to-s.patch><v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch><v25-0014-Set-pd_prune_xid-on-insert.patch>
    
    A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from reviewing this patch.
    
    1 - 0001
    ```
    +-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
    +checkpoint;
    +-- truncating the VM ensures that the next vacuum will need to set it
    +select pg_truncate_visibility_map('test_vac_unmodified_heap');
    +-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
    +-- modification
    +vacuum test_vac_unmodified_heap;
    ```
    
    The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
    ```
    evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
     blkno | all_visible | all_frozen
    -------+-------------+------------
         0 | t           | t
    (1 row)
    ```
    
    As you have been using the extension pg_visibility, adding the verification with pg_visibility_map() should not be a burden.
    
    2 - 0001
    ```
     		if (presult.all_frozen)
     		{
    +			/*
    +			 * We can pass InvalidTransactionId as our cutoff_xid, since a
    +			 * snapshotConflictHorizon sufficient to make everything safe for
    +			 * REDO was logged when the page's tuples were frozen.
    +			 */
     			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
    -			flags |= VISIBILITYMAP_ALL_FROZEN;
    +			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
     		}
    ```
    
    The comment here is a little confusing. In the old code, the Assert() as immediately above the call visibilitymap_set(), and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the comment far away from the call visibilitymap_set(), so I think the comment should stay together with the call of visibilitymap_set().
    
    3 - 0002
    ```
     * If it finds that the page-level visibility hint or VM is corrupted, it will
    * fix them by clearing the VM bits and visibility page hint. This does not
    ```
    
    In the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually “page-level visibility hint”, so how about just “visibility hint”.
    
    4 - 0002
    ```
     	/*
    -	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
    -	 * page-level bit is clear.  However, it's possible that the bit got
    -	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
    -	 * with buffer lock before concluding that the VM is corrupt.
    +	 * For the purposes of logging, count whether or not the page was newly
    +	 * set all-visible and, potentially, all-frozen.
     	 */
    -	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
    -			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
    +	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
    +		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
     	{
    ```
    
    Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into “if (do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like:
    
    ```
    Do_set_vm = heap_page_will_set_vm(&new_vmbits)
    If (!do_set_vm)
       Return presult.ndeleted;
    
    PageSetAllVisible(page);
    MarkBufferDirty(buf);
    old_vmbits = visibilitymap_set(new_vmbits);
    If (old_vmbits..)
    {
    ..
    }
    Else if (old_vmbits…)
    {
    …
    }
    
    Return presult.ndeleted;
    ```
    
    5 - 0003
    ```
     /*
      *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
      *
    @@ -2076,15 +1979,14 @@ lazy_scan_prune(LVRelState *vacrel,
     				bool *vm_page_frozen)
     {
     	Relation	rel = vacrel->rel;
    -	bool		do_set_vm = false;
    -	uint8		new_vmbits = 0;
    -	uint8		old_vmbits = 0;
     	PruneFreezeResult presult;
     	PruneFreezeParams params = {
     		.relation = rel,
     		.buffer = buf,
    +		.vmbuffer = vmbuffer,
    +		.blk_known_av = all_visible_according_to_vm,
     		.reason = PRUNE_VACUUM_SCAN,
    -		.options = HEAP_PAGE_PRUNE_FREEZE,
    +		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
     		.vistest = vacrel->vistest,
     		.cutoffs = &vacrel->cutoffs,
     	};
    ```
    
    This maybe a legacy bug. Here presult is not initialized, and it is immediately passed to heap_page_prune_and_freeze():
    
    ```
    	heap_page_prune_and_freeze(&params,
    							   &presult, <=== here
    							   &vacrel->offnum,
    							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
    ```
    
    Then heap_page_prune_and_freeze() immediately calls prune_freeze_setup():
    ```
    	/* Initialize prstate */
    	prune_freeze_setup(params,
    					   new_relfrozen_xid, new_relmin_mxid,
    					   presult, &prstate);
    ```
    
    And prune_freeze_setup() takes presult as a const pointer:
    ```
    static void
    prune_freeze_setup(PruneFreezeParams *params,
    				   TransactionId *new_relfrozen_xid,
    				   MultiXactId *new_relmin_mxid,
    				   const PruneFreezeResult *presult, <=== here
    				   PruneState *prstate)
    {
        prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets; <== here, presult->deadoffsets could be a random value
    }
    ```
    
    As this is a separate issue off the current patch, I just filed a new patch to fix it. Please take a look at:
    https://www.postgresql.org/message-id/CAEoWx2%3DjiD1nqch4JQN%2BodAxZSD7mRvdoHUGJYN2r6tQG_66yQ%40mail.gmail.com
    
    6 - 0003
    ```
    + * Returns true if one or both VM bits should be set, along with returning the
    + * desired what bits should be set in the VM in *new_vmbits.
    ```
    
    Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be set”.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  67. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Peter Eisentraut <peter@eisentraut.org> — 2025-12-13T13:59:29Z

    On 20.11.25 18:19, Melanie Plageman wrote:
    > +	prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
    
    In your patch 
    v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the 
    assignment above casts away the const qualification of the function 
    argument presult:
    
    +static void
    +prune_freeze_setup(PruneFreezeParams *params,
    +				   TransactionId new_relfrozen_xid,
    +				   MultiXactId new_relmin_mxid,
    +				   const PruneFreezeResult *presult,
    +				   PruneState *prstate)
    
    (The cast is otherwise unnecessary, since the underlying type is the 
    same on both sides.)
    
    Since prstate->deadoffsets is in fact later modified, this makes the 
    original const qualification invalid.
    
    I suggest the attached patch to remove the faulty const qualification 
    and the then-unnecessary cast.
    
  68. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-15T21:05:19Z

    On Sat, Dec 13, 2025 at 8:59 AM Peter Eisentraut <peter@eisentraut.org> wrote:
    >
    > On 20.11.25 18:19, Melanie Plageman wrote:
    > > +     prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
    >
    > In your patch
    > v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the
    > assignment above casts away the const qualification of the function
    > argument presult:
    
    Yea, this code (prune_freeze_setup() with a const-qualified
    PruneFreezeResult parameter) is actually already in master -- not just
    in this patchset.
    
    > +static void
    > +prune_freeze_setup(PruneFreezeParams *params,
    > +                                  TransactionId new_relfrozen_xid,
    > +                                  MultiXactId new_relmin_mxid,
    > +                                  const PruneFreezeResult *presult,
    > +                                  PruneState *prstate)
    >
    > (The cast is otherwise unnecessary, since the underlying type is the
    > same on both sides.)
    >
    > Since prstate->deadoffsets is in fact later modified, this makes the
    > original const qualification invalid.
    
    I didn't realize I was misusing const here. What I meant to indicate
    by defining the prune_freeze_setup() parameter, as const, is that the
    PruneFreezeResult wouldn't be modified by prune_freeze_setup(). I did
    not mean to indicate that no members of PruneFreezeResult would ever
    be modified. deadoffsets is not modified in prune_freeze_setup(). So,
    are you saying that I can't define a parameter as const if even the
    caller modifies it?
    
    I'm fine with committing a change, I just want to understand.
    
    - Melanie
    
    
    
    
  69. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-15T21:29:03Z

    Hi,
    
    Attached v26 includes a new patch, 0002, which gets rid of
    all_visible_according_to_vm in lazy_scan_prune(). We've kept this
    cached copy of the all-visible bit since the VM was added way back in
    608195a3a365. Back then, the VM wasn't pinned unless
    all_visible_according_to_vm was false. Now that we unconditionally
    have the VM page pinned, there isn't much performance benefit to using
    that cached value. I did some testing of the worst possible case and
    saw no difference in timing. By removing that, we simplify heap vacuum
    code now.  And we improve clarity once the VM update is combined into
    the prune/freeze WAL record and when the VM is set on-access.
    
    I think 0001 and 0002 (and maybe 0003) are worthwhile clarity
    improvements on their own.
    
    On Wed, Dec 10, 2025 at 11:07 PM Chao Li <li.evan.chao@gmail.com> wrote:
    >
    > A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from reviewing this patch.
    
    Thanks for the continued review. Your feedback is improving the patchset.
    
    > The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
    > ```
    > evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
    >  blkno | all_visible | all_frozen
    > -------+-------------+------------
    >      0 | t           | t
    > (1 row)
    
    I've done this. I've actually added three such verifications -- one
    after each step where the VM is expected to change. It shouldn't be
    very expensive, so I think it is okay. The way the test would fail if
    the buffer wasn't correctly dirtied is that it would assert out -- so
    the visibility map test wouldn't even have a chance to fail. But, I
    think it is also okay to confirm that the expected things are
    happening with the VM -- it just gives us extra coverage.
    
    >                 if (presult.all_frozen)
    >                 {
    > +                       /*
    > +                        * We can pass InvalidTransactionId as our cutoff_xid, since a
    > +                        * snapshotConflictHorizon sufficient to make everything safe for
    > +                        * REDO was logged when the page's tuples were frozen.
    > +                        */
    >                         Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
    > -                       flags |= VISIBILITYMAP_ALL_FROZEN;
    > +                       new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
    >                 }
    >
    > The comment here is a little confusing. In the old code, the Assert() as immediately above the call visibilitymap_set(), and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the comment far away from the call visibilitymap_set(), so I think the comment should stay together with the call of visibilitymap_set().
    
    Good point. I've moved it closer to visibilitymap_set() and modified
    and moved the assert so that it is together with the comment. I think
    the comment makes little sense without the assertion.
    
    >  * If it finds that the page-level visibility hint or VM is corrupted, it will
    > * fix them by clearing the VM bits and visibility page hint. This does not
    >
    > In the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually “page-level visibility hint”, so how about just “visibility hint”.
    
    I've changed this.
    
    >         /*
    > -        * As of PostgreSQL 9.2, the visibility map bit should never be set if the
    > -        * page-level bit is clear.  However, it's possible that the bit got
    > -        * cleared after heap_vac_scan_next_block() was called, so we must recheck
    > -        * with buffer lock before concluding that the VM is corrupt.
    > +        * For the purposes of logging, count whether or not the page was newly
    > +        * set all-visible and, potentially, all-frozen.
    >          */
    > -       else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
    > -                        visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
    > +       if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
    > +               (new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
    >         {
    > ```
    >
    > Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into “if (do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like:
    
    Good point. I've actually gone ahead in 0002 and refactored this whole
    section a bit (I got rid of all_visible_according_to_vm). 0002 is a
    new patch in this attached v26, and it needs review. I think this
    refactoring makes the code quite a bit clearer -- especially once we
    start setting the VM on-access. It does, amongst other things, return
    early if all_visible is false, like you suggested.
    
    > + * Returns true if one or both VM bits should be set, along with returning the
    > + * desired what bits should be set in the VM in *new_vmbits.
    > ```
    >
    > Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be set”.
    
    Fixed.
    
    - Melanie
    
  70. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Peter Eisentraut <peter@eisentraut.org> — 2025-12-16T12:18:25Z

    On 15.12.25 22:05, Melanie Plageman wrote:
    > On Sat, Dec 13, 2025 at 8:59 AM Peter Eisentraut <peter@eisentraut.org> wrote:
    >>
    >> On 20.11.25 18:19, Melanie Plageman wrote:
    >>> +     prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets;
    >>
    >> In your patch
    >> v22-0001-Split-heap_page_prune_and_freeze-into-helpers.patch, the
    >> assignment above casts away the const qualification of the function
    >> argument presult:
    > 
    > Yea, this code (prune_freeze_setup() with a const-qualified
    > PruneFreezeResult parameter) is actually already in master -- not just
    > in this patchset.
    > 
    >> +static void
    >> +prune_freeze_setup(PruneFreezeParams *params,
    >> +                                  TransactionId new_relfrozen_xid,
    >> +                                  MultiXactId new_relmin_mxid,
    >> +                                  const PruneFreezeResult *presult,
    >> +                                  PruneState *prstate)
    >>
    >> (The cast is otherwise unnecessary, since the underlying type is the
    >> same on both sides.)
    >>
    >> Since prstate->deadoffsets is in fact later modified, this makes the
    >> original const qualification invalid.
    > 
    > I didn't realize I was misusing const here. What I meant to indicate
    > by defining the prune_freeze_setup() parameter, as const, is that the
    > PruneFreezeResult wouldn't be modified by prune_freeze_setup(). I did
    > not mean to indicate that no members of PruneFreezeResult would ever
    > be modified.
    
    I'm not sure there is a difference between these two statements.  The 
    struct won't be modified is the same as none of its fields will be modified.
    
    > deadoffsets is not modified in prune_freeze_setup(). So,
    > are you saying that I can't define a parameter as const if even the
    > caller modifies it?
    
    You are not modifying deadoffsets in prune_freeze_setup(), but you are 
    assigning its address to a pointer variable that is not const-qualified, 
    and so it could be used to modify it later on.
    
    A caller to prune_freeze_setup() that sees the signature const 
    PruneFreezeResult *presult could pass a pointer to a PruneFreezeResult 
    object that is notionally in read-only memory.  But through the 
    non-const-qualified pointer you could later modify the pointed-to 
    memory, which would be invalid.  The point of propagating the qualifiers 
    is to prevent that at compile time.
    
    If what you want is something like, "prune_freeze_setup() does not 
    change any of the fields of what presult points to, but it does record a 
    pointer to one of its fields with the intention of modifying it later 
    after prune_freeze_setup() is finished", then I think C cannot represent 
    that with this API.
    
    Here is a simplified example:
    
    #include <stdlib.h>
    
    // corresponds to PruneFreezeResult
    struct foo
    {
    	int offsets[5];
    };
    
    // corresponds to PruneState
    struct bar
    {
    	int *offsets;
    };
    
    static void setup(const struct foo *f)
    {
    	struct bar *b = malloc(sizeof(struct bar));
    
    	b->offsets = f->offsets;  // warning
    }
    
    This produces a warning:
    
    test.c:20:20: warning: assignment discards 'const' qualifier from 
    pointer target type
    
    The reason is that what "f" points to is const, which means that all its 
    fields are const.  The fix is to remove the const from the function 
    argument declaration.
    
    One of the possible sources of confusion here is that one struct uses an 
    array and the other a pointer, and these sometimes behave similarly and 
    sometimes not.
    
    
    
    
    
  71. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-16T16:07:17Z

    On Tue, Dec 16, 2025 at 7:18 AM Peter Eisentraut <peter@eisentraut.org> wrote:
    >
    > You are not modifying deadoffsets in prune_freeze_setup(), but you are
    > assigning its address to a pointer variable that is not const-qualified,
    > and so it could be used to modify it later on.
    >
    > A caller to prune_freeze_setup() that sees the signature const
    > PruneFreezeResult *presult could pass a pointer to a PruneFreezeResult
    > object that is notionally in read-only memory.  But through the
    > non-const-qualified pointer you could later modify the pointed-to
    > memory, which would be invalid.  The point of propagating the qualifiers
    > is to prevent that at compile time.
    
    Thanks for the explanation. I've committed your proposed fix.
    
    - Melanie
    
    
    
    
  72. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-16T16:58:50Z

    On Wed, Dec 3, 2025 at 6:07 PM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > If we're just talking about the renaming, looking at procarray.c, it
    > is full of the word "removable" because its functions were largely
    > used to examine and determine if everyone can see an xmax as committed
    > and thus if that tuple is removable from their perspective. But
    > nothing about the code that I can see means it has to be an xmax. We
    > could just as well use the functions to determine if everyone can see
    > an xmin as committed.
    
    In the attached v27, I've removed the commit that renamed functions in
    procarray.c. I've added a single wrapper GlobalVisTestXidNotRunning()
    that is used in my code where I am testing live tuples. I think you'll
    find that I've addressed all of your review comments now -- as I've
    also gotten rid of the confusing blk_known_av logic through a series
    of refactors.
    
    The one outstanding point is which commits should bump
    XLOG_PAGE_MAGIC. (also review of the reworked patches).
    
    - Melanie
    
  73. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-12-17T18:27:05Z

    Hi!
    
    in v27-0001:
    > Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
    > > The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
    > > ```
    > > evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
    > > blkno | all_visible | all_frozen
    > > -------+-------------+------------
    > > 0 | t | t
    > > (1 row)
    
    > I've done this. I've actually added three such verifications -- one
    > after each step where the VM is expected to change. It shouldn't be
    > very expensive, so I think it is okay. The way the test would fail if
    > the buffer wasn't correctly dirtied is that it would assert out -- so
    > the visibility map test wouldn't even have a chance to fail. But, I
    > think it is also okay to confirm that the expected things are
    > happening with the VM -- it just gives us extra coverage.
    
    +1 on extra coverage. Should we also do sql-level check that the VM
    indeed does not need to set PD_ALL_VISIBLE (check header bytes using
    pageinspect?).
    
    
    v27-0003 & v27-0004: I did not get the exact reason we introduced
    `identify_and_fix_vm_corruption` in 0003 and moved code in 0004 to
    another place. I can see we have this starting v25 of patch set. Well,
    maybe this is not an issue at all...
    
    
    in v27-0005. This patch changes code which is not exercised in
    tests[0]. I spent some time understanding the conditions when we
    entered this. There is a comment about non-finished relation
    extension, but I got no success trying to reproduce this. I ended up
    modifying code to lose PageSetAllVisible in proper places and running
    vacuum. Looks like everything works as expected. I will spend some
    more time on this, maybe I will be successful in writing an
    injection-point-based TAP test which hits this...
    
    
    
    [0] https://coverage.postgresql.org/src/backend/access/heap/vacuumlazy.c.gcov.html#1902
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  74. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-18T00:30:01Z

    Thanks for the review!
    
    In addition to addressing your feedback, attached v28 includes a
    number of small fixes to comments, commit messages, and other things.
    Notably, I've added one new refactoring patch 0009, which reduces the
    diff of 0010 -- using the GlobalVisState instead of OldestXmin for
    page visibility -- even further.
    
    On Wed, Dec 17, 2025 at 1:27 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > > I've done this. I've actually added three such verifications -- one
    > > after each step where the VM is expected to change. It shouldn't be
    > > very expensive, so I think it is okay. The way the test would fail if
    > > the buffer wasn't correctly dirtied is that it would assert out -- so
    > > the visibility map test wouldn't even have a chance to fail. But, I
    > > think it is also okay to confirm that the expected things are
    > > happening with the VM -- it just gives us extra coverage.
    >
    > +1 on extra coverage. Should we also do sql-level check that the VM
    > indeed does not need to set PD_ALL_VISIBLE (check header bytes using
    > pageinspect?).
    
    That's an interesting idea. I checked and, AFAICT, there are no tests
    currently directly comparing the flags column returned by the
    pageinspect page_header() function to one of the flag values. I've
    added the following to attached v28.
    
    SELECT (flags & x'0004'::int) <> 0
            FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
    
    But I'm not sure if it is weird/confusing to be comparing the flag
    directly to the number 4 like this. I don't really want to bother with
    adding another function to pageinspect returning the status of
    PD_ALL_VISIBLE (like page_visible() or something).
    
    > v27-0003 & v27-0004: I did not get the exact reason we introduced
    > `identify_and_fix_vm_corruption` in 0003 and moved code in 0004 to
    > another place. I can see we have this starting v25 of patch set. Well,
    > maybe this is not an issue at all...
    
    It's mostly for ease of review. This is a pretty sensitive area of
    code, so I thought it would be easier for the reviewer to confirm
    correctness if I split it up. Andres had mentioned that the commit was
    hard to review because so many different things were happening.
    
    In v27, 0003 moves the VM clear code into a helper. 0004 and 0005
    moves all the VM setting/clearing code to
    heap_page_prune_and_freeze(). And 0006 actually sets the VM in the
    same critical section as pruning/freezing and emits a single WAL
    record.
    
    I'm not really sure which commits should stay independent in the final
    version I push to master.
    
    > in v27-0005. This patch changes code which is not exercised in
    > tests[0]. I spent some time understanding the conditions when we
    > entered this. There is a comment about non-finished relation
    > extension, but I got no success trying to reproduce this. I ended up
    > modifying code to lose PageSetAllVisible in proper places and running
    > vacuum. Looks like everything works as expected. I will spend some
    > more time on this, maybe I will be successful in writing an
    > injection-point-based TAP test which hits this...
    
    Based on the coverage report link you provided, that code is changed
    by v27 0007, not 0005. 0005 is about moving an assertion out of
    lazy_scan_prune(). 0007 changes lazy_scan_new_or_empty() (the code in
    question).
    
    Regarding 0007, it looks like what is uncovered (the orange bits in
    the coverage report are uncovered, I assume) is empty pages _without_
    PD_ALL_VISIBLE set. I don't see anywhere where PageSetAllVisible() is
    called except vacuum and COPY FREEZE.
    
    If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
    getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
    causing us to vacuum an all-frozen empty page.
    
    Then the question is, why wouldn't we have coverage of the empty page
    first being set all-visible/all-frozen? It can't be COPY FREEZE
    because the page is empty. And it can't be vacuum, because then we
    would have coverage. It's very mysterious.
    
    It would be good to have coverage for this case. I don't think you'll
    need an injection point for the main case of "empty page not yet set
    all-visible is vacuumed for the first time" (unless I'm
    misunderstanding something).
    
    I'm not sure how you'll test the "vacuuming an empty, previously
    uninitialized page" case described in this comment, though.
    
                 * It's possible that another backend has extended the heap,
                 * initialized the page, and then failed to WAL-log the page due
                 * to an ERROR.  Since heap extension is not WAL-logged, recovery
                 * might try to replay our record setting the page all-visible and
                 * find that the page isn't initialized, which will cause a PANIC.
                 * To prevent that, check whether the page has been previously
                 * WAL-logged, and if not, do that now.
    
    You'd want to force an error during relation extension and then vacuum
    the page. I don't know if you need an injection point to force the
    error -- depends on what kind of error, I think.
    
    So that I know for attribution, did you review 0003-0005?
    
    - Melanie
    
  75. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-12-18T08:55:46Z

    On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > > in v27-0005. This patch changes code which is not exercised in
    > > tests[0]. I spent some time understanding the conditions when we
    > > entered this. There is a comment about non-finished relation
    > > extension, but I got no success trying to reproduce this. I ended up
    > > modifying code to lose PageSetAllVisible in proper places and running
    > > vacuum. Looks like everything works as expected. I will spend some
    > > more time on this, maybe I will be successful in writing an
    > > injection-point-based TAP test which hits this...
    >
    > Based on the coverage report link you provided, that code is changed
    > by v27 0007, not 0005. 0005 is about moving an assertion out of
    > lazy_scan_prune(). 0007 changes lazy_scan_new_or_empty() (the code in
    > question).
    >
    > Regarding 0007, it looks like what is uncovered (the orange bits in
    > the coverage report are uncovered, I assume) is empty pages _without_
    > PD_ALL_VISIBLE set. I don't see anywhere where PageSetAllVisible() is
    > called except vacuum and COPY FREEZE.
    
    Sure, I meant 0007.
    
    > If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
    > getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
    > causing us to vacuum an all-frozen empty page.
    
    Yes, vacuum (disable_page_skipping);
    
    > Then the question is, why wouldn't we have coverage of the empty page
    > first being set all-visible/all-frozen? It can't be COPY FREEZE
    > because the page is empty. And it can't be vacuum, because then we
    > would have coverage. It's very mysterious.
    >
    > It would be good to have coverage for this case. I don't think you'll
    > need an injection point for the main case of "empty page not yet set
    > all-visible is vacuumed for the first time" (unless I'm
    > misunderstanding something).
    >
    > I'm not sure how you'll test the "vacuuming an empty, previously
    > uninitialized page" case described in this comment, though.
    >
    >              * It's possible that another backend has extended the heap,
    >              * initialized the page, and then failed to WAL-log the page due
    >              * to an ERROR.  Since heap extension is not WAL-logged, recovery
    >              * might try to replay our record setting the page all-visible and
    >              * find that the page isn't initialized, which will cause a PANIC.
    >              * To prevent that, check whether the page has been previously
    >              * WAL-logged, and if not, do that now.
    >
    > You'd want to force an error during relation extension and then vacuum
    > the page. I don't know if you need an injection point to force the
    > error -- depends on what kind of error, I think.
    
    I did small archeology and this "if (PageIsEmpty(page)) {   if
    (!PageIsAllVisible(page)) { .... }}" code  originates back to
    608195a3a365. Comment about not WAL-logged relation extension is from
    a6370fd9ed3d, and I don't think we need to think about this case.
    
    I am currently inclined to think that we cannot see an empty page that
    has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
    we are in a critical section, and we WAL-log everything we do, so our
    changes should not be half-made. Maybe as of 608195a3a365, there was a
    case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
    happens on HEAD.
    
    > So that I know for attribution, did you review 0003-0005?
    
    yes, but I did not have any valuable review points for them.
    
    
    Also, after the whole set is committed, we should then never
    experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
    they will be set in a single WAL record. The only cases when heap and
    VM disagrees on all-visibility then are corruption,
    pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
    If my understanding is correct, should we add document this?
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  76. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-18T15:18:09Z

    On Thu, Dec 18, 2025 at 3:55 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > On Thu, 18 Dec 2025 at 05:30, Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    >
    > > If I was trying to guess how empty pages with PD_ALL_VISIBLE set are
    > > getting vacuumed, I would think it is due to SKIP_PAGES_THRESHOLD
    > > causing us to vacuum an all-frozen empty page.
    >
    > Yes, vacuum (disable_page_skipping);
    
    Ah, right, that would be a reliable way for it to happen.
    
    > > Then the question is, why wouldn't we have coverage of the empty page
    > > first being set all-visible/all-frozen? It can't be COPY FREEZE
    > > because the page is empty. And it can't be vacuum, because then we
    > > would have coverage. It's very mysterious.
    <--snip-->
    > I am currently inclined to think that we cannot see an empty page that
    > has PD_ALL_VISIBLE not-set. This is because when we make a page empty,
    > we are in a critical section, and we WAL-log everything we do, so our
    > changes should not be half-made. Maybe as of 608195a3a365, there was a
    > case with empry-page-without-PD_ALL_VISIBLE, but I dont think this
    > happens on HEAD.
    
    Right, so the way that empty pages get set PD_ALL_VISIBLE is when a
    page has all its tuples deleted, the next time it is vacuumed it will
    be set all-visible and all-frozen and have PD_ALL_VISIBLE set. (if
    it's a trailing page it will be truncated, but any non-trailing page
    will be like this).
    
    But you are right, I don't see any non-error code path where a heap
    page would become empty (all line pointers set unused) and then not be
    set all-visible. Only vacuum sets line pointers unused and if all the
    line pointers are unused it will always set the page all-visible.
    
    I think, though, that if we error out in lazy_scan_prune() after
    returning from heap_page_prune_and_freeze() such that we don't set the
    empty page all-visible, we can end up with an empty page without
    PD_ALL_VISIBLE set. You can see how this might work by patching the VM
    set code in lazy_scan_prune() to skip empty pages.
    
    > I did small archeology and this "if (PageIsEmpty(page)) {   if
    > (!PageIsAllVisible(page)) { .... }}" code  originates back to
    > 608195a3a365. Comment about not WAL-logged relation extension is from
    > a6370fd9ed3d, and I don't think we need to think about this case.
    
    Thanks for looking into this. Even if this code was added to handle
    the error codepath I mentioned above, it seems like it would have been
    good enough to just let lazy_scan_prune() handle setting the empty
    page all-visible the next time the page was vacuumed. Since there is
    no non-error code path where this can happen, it doesn't seem like it
    would merit its own special case.
    
    It is possible it was more common as of 608195a3a365, as you say.
    
    I don't understand how the bug fixed by a6370fd9ed3d can happen. When
    a new page is initialized, flags are set to 0, so regardless of WAL
    logging of the extension not happening, how would the new page have
    been set PD_ALL_VISIBLE?  We'll have to ask Andres or Robert about how
    this was hit.
    
    > Also, after the whole set is committed, we should then never
    > experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
    > they will be set in a single WAL record. The only cases when heap and
    > VM disagrees on all-visibility then are corruption,
    > pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
    > If my understanding is correct, should we add document this?
    
    Even on current master, I don't see a scenario other than VM
    corruption or truncation where PD_ALL_VISIBLE can be set but not the
    VM (or vice versa). The only way would be if you error out after
    setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
    is not in a critical section in lazy_scan_prune(), so it won't panic
    and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
    later get written out. But the only obvious way I see to error out of
    MarkBufferDirty() is if the buffer is not valid -- which would have
    kept us from doing previous operations on the buffer, I would think.
    
    It's true this will no longer happen after my patches, as
    PageSetAllVisible() will happen in a critical section. We could add a
    comment about this particular scenario in the code somewhere. But I
    don't think we should document it in any user-facing documentation
    since you could still truncate the VM and have the two out of sync.
    
    - Melanie
    
    
    
    
  77. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-12-18T15:45:58Z

    On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    > > Also, after the whole set is committed, we should then never
    > > experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
    > > they will be set in a single WAL record. The only cases when heap and
    > > VM disagrees on all-visibility then are corruption,
    > > pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
    > > If my understanding is correct, should we add document this?
    >
    > Even on current master, I don't see a scenario other than VM
    > corruption or truncation where PD_ALL_VISIBLE can be set but not the
    > VM (or vice versa). The only way would be if you error out after
    > setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
    > is not in a critical section in lazy_scan_prune(), so it won't panic
    > and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
    > later get written out. But the only obvious way I see to error out of
    > MarkBufferDirty() is if the buffer is not valid -- which would have
    > kept us from doing previous operations on the buffer, I would think.
    >
    
    Well... I may be missing something, but on current HEAD,
    XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different
    record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So,
    WAL writer may end up kill-9-ed just after
    XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and
    XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have
    discrepancy. This does not happen with a single WAL record.
    Another simple reproducer here: standby streaming, receiving
    XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad,
    and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by
    the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I
    missing something?
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  78. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-12-18T18:07:20Z

    On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    
    > But you are right, I don't see any non-error code path where a heap
    > page would become empty (all line pointers set unused) and then not be
    > set all-visible. Only vacuum sets line pointers unused and if all the
    > line pointers are unused it will always set the page all-visible.
    >
    > I think, though, that if we error out in lazy_scan_prune() after
    > returning from heap_page_prune_and_freeze() such that we don't set the
    > empty page all-visible, we can end up with an empty page without
    > PD_ALL_VISIBLE set. You can see how this might work by patching the VM
    > set code in lazy_scan_prune() to skip empty pages.
    >
    
    Thank you for your explanation!  I completely forgot that PD_ALL_VIS
    is a non-persistent change (hint bit). so its update can be trivially
    lost.
    The simplest real-life example is being killed just after returning
    from heap_page_prune_and_freeze, yes.
    PFA tap test covering lazy_scan_new_or_empty code path for
    empty-but-not-all-visible page
    
    -- 
    Best regards,
    Kirill Reshke
    
  79. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-18T19:57:57Z

    On Thu, Dec 18, 2025 at 1:07 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    >
    > > But you are right, I don't see any non-error code path where a heap
    > > page would become empty (all line pointers set unused) and then not be
    > > set all-visible. Only vacuum sets line pointers unused and if all the
    > > line pointers are unused it will always set the page all-visible.
    > >
    > > I think, though, that if we error out in lazy_scan_prune() after
    > > returning from heap_page_prune_and_freeze() such that we don't set the
    > > empty page all-visible, we can end up with an empty page without
    > > PD_ALL_VISIBLE set. You can see how this might work by patching the VM
    > > set code in lazy_scan_prune() to skip empty pages.
    >
    > Thank you for your explanation!  I completely forgot that PD_ALL_VIS
    > is a non-persistent change (hint bit). so its update can be trivially
    > lost.
    > The simplest real-life example is being killed just after returning
    > from heap_page_prune_and_freeze, yes.
    > PFA tap test covering lazy_scan_new_or_empty code path for
    > empty-but-not-all-visible page
    
    Cool test! I'm going to have to think more about whether or not it is
    worth adding a whole new TAP test for this codepath. Is there an
    existing TAP test we could add it to so we don't need to make a new
    cluster, etc? How long does the test take to run? Obviously it will be
    quite short, but every bit we add to the test suite counts. I don't
    actually know how much overhead there is with injection points.
    
    I was chatting with Andres and he mentioned there is one other case
    where you can end up in this code path (empty page without
    PD_ALL_VISIBLE set) and this case does actually trigger this code:
    
                if (RelationNeedsWAL(vacrel->rel) &&
                    !XLogRecPtrIsValid(PageGetLSN(page)))
                    log_newpage_buffer(buf, true);
    
    If you are inserting to a new page and you successfully call
    PageInit() (making the page no longer considered new by PageIsNew()
    because pd_upper will be set) but you error out before actually
    inserting the tuple, then you will have an empty page without
    PD_ALL_VISIBLE set. And assuming you error out before emitting WAL,
    the page will not have a valid LSN set. So you will hit that code
    which calls log_newpage_buffer().
    
    I would say this case is so narrow (the log_newpage_buffer() codepath
    in lazy_scan_new_or_empty()), it's not worth the added test overhead,
    but I just wanted to share what I learned about when this code could
    be hit.
    
    Previously it was more common in the bulk extension case to have empty
    pages not set PD_ALL_VISIBLE because bulk extension would call
    PageInit() on all of the pages it extended so all the pages except the
    target page were empty (today they are not initialized so they go into
    the PageIsNew() branch).
    
    So, in both cases, it seems like the empty page not set PD_ALL_VISIBLE
    mostly only hit if we previously errored out.
    
    - Melanie
    
    
    
    
  80. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-18T20:04:34Z

    On Thu, Dec 18, 2025 at 10:46 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > > > Also, after the whole set is committed, we should then never
    > > > experience discrepancy between  PD_ALL_VISIBLE and VM bits? Because
    > > > they will be set in a single WAL record. The only cases when heap and
    > > > VM disagrees on all-visibility then are corruption,
    > > > pg_visibilitymap_truncate and old data (data before v19+ upgrade?)
    > > > If my understanding is correct, should we add document this?
    > >
    > > Even on current master, I don't see a scenario other than VM
    > > corruption or truncation where PD_ALL_VISIBLE can be set but not the
    > > VM (or vice versa). The only way would be if you error out after
    > > setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE
    > > is not in a critical section in lazy_scan_prune(), so it won't panic
    > > and dump shared memory, so the buffer with PD_ALL_VISIBLE set may
    > > later get written out. But the only obvious way I see to error out of
    > > MarkBufferDirty() is if the buffer is not valid -- which would have
    > > kept us from doing previous operations on the buffer, I would think.
    >
    > Well... I may be missing something, but on current HEAD,
    > XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different
    > record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So,
    > WAL writer may end up kill-9-ed just after
    > XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and
    > XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have
    > discrepancy. This does not happen with a single WAL record.
    > Another simple reproducer here: standby streaming, receiving
    > XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad,
    > and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by
    > the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I
    > missing something?
    
    Well, currently XLOG_HEAP2_PRUNE_VACUUM_SCAN doesn't set
    PD_ALL_VISIBLE. PD_ALL_VISIBLE is WAL-logged in the XLOG_HEAP2_VISIBLE
    record because in lazy_scan_prune() we call PageSetAllVisible() and
    then visibilitymap_set() -> log_heap_visible() adds the heap buffer to
    the WAL chain (with XLogRegisterBuffer()).
    
    And if you notice when XLOG_HEAP2_VISIBLE is replayed in
    heap_xlog_visible(), that is where we do PageSetAllVisible() on the
    heap page.
    
    So I think you can end up with PD_ALL_VISIBLE set if you error out
    precisely between setting it and WAL logging it because we don't set
    it in a critical section. But you can't end up with a WAL record that
    sets PD_ALL_VISIBLE and another one that sets the VM.
    
    Once we have my code changes, you can never end up with PD_ALL_VISIBLE
    set and the VM not set because they are in the same critical section
    and if we error out, it will cause a panic which will purge shared
    memory.
    
    - Melanie
    
    
    
    
  81. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-12-18T20:31:27Z

    On Fri, 19 Dec 2025 at 00:58, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > On Thu, Dec 18, 2025 at 1:07 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
    > >
    > > On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
    > > <melanieplageman@gmail.com> wrote:
    > >
    > > > But you are right, I don't see any non-error code path where a heap
    > > > page would become empty (all line pointers set unused) and then not be
    > > > set all-visible. Only vacuum sets line pointers unused and if all the
    > > > line pointers are unused it will always set the page all-visible.
    > > >
    > > > I think, though, that if we error out in lazy_scan_prune() after
    > > > returning from heap_page_prune_and_freeze() such that we don't set the
    > > > empty page all-visible, we can end up with an empty page without
    > > > PD_ALL_VISIBLE set. You can see how this might work by patching the VM
    > > > set code in lazy_scan_prune() to skip empty pages.
    > >
    > > Thank you for your explanation!  I completely forgot that PD_ALL_VIS
    > > is a non-persistent change (hint bit). so its update can be trivially
    > > lost.
    > > The simplest real-life example is being killed just after returning
    > > from heap_page_prune_and_freeze, yes.
    > > PFA tap test covering lazy_scan_new_or_empty code path for
    > > empty-but-not-all-visible page
    >
    > Cool test! I'm going to have to think more about whether or not it is
    > worth adding a whole new TAP test for this codepath. Is there an
    > existing TAP test we could add it to so we don't need to make a new
    > cluster, etc? How long does the test take to run? Obviously it will be
    > quite short, but every bit we add to the test suite counts. I don't
    > actually know how much overhead there is with injection points.
    >
    
    Well, on my pc this test runs in ~1.5 sec. I did not find any other
    TAP test to place this, so created a new.
    Actually, I only check for specific patterns in the log file of the
    cluster in this test, so this test can instead be a regression test.
    
    ```
    reshke=# VACUUM (DISABLE_PAGE_SKIPPING) vac_empty_test;
    NOTICE:  notice triggered for injection point vacuum-empty-page-non-all-vis
    VACUUM
    reshke=#
    ```
    We will just check in the .out file that the code hits
    'vacuum-empty-page-non-all-vis' after an error.
    injection points overhead should not be that awful, just from my
    experience. Maybe buildfarm members can say something here, I dunno.
    
    Also, we already have a bunch of regression+inj point tests for some
    rare cases, exempli gratia
    src/test/modules/nbtree/sql/nbtree_half_dead_pages.sql.
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  82. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Xuneng Zhou <xunengzhou@gmail.com> — 2025-12-19T03:38:24Z

    He Melanie,
    
    Thanks for working on this.
    
    On Wed, Dec 17, 2025 at 12:59 AM Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > On Wed, Dec 3, 2025 at 6:07 PM Melanie Plageman
    > <melanieplageman@gmail.com> wrote:
    > >
    > > If we're just talking about the renaming, looking at procarray.c, it
    > > is full of the word "removable" because its functions were largely
    > > used to examine and determine if everyone can see an xmax as committed
    > > and thus if that tuple is removable from their perspective. But
    > > nothing about the code that I can see means it has to be an xmax. We
    > > could just as well use the functions to determine if everyone can see
    > > an xmin as committed.
    >
    > In the attached v27, I've removed the commit that renamed functions in
    > procarray.c. I've added a single wrapper GlobalVisTestXidNotRunning()
    > that is used in my code where I am testing live tuples. I think you'll
    > find that I've addressed all of your review comments now -- as I've
    > also gotten rid of the confusing blk_known_av logic through a series
    > of refactors.
    >
    > The one outstanding point is which commits should bump
    > XLOG_PAGE_MAGIC. (also review of the reworked patches).
    >
    > - Melanie
    
    I’ve done a basic review of patches 1 and 2. Here are some comments
    which may be somewhat immature, as this is a fairly large change set
    and I’m new to some parts of the code.
    
    1) Potential stale old_vmbits after VM repair n v2
    
    // Corruption check 1
    if (!PageIsAllVisible(page) &&
    (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
    {
    visibilitymap_clear(...); // VM now cleared to 0
    // but old_vmbits still holds ALL_VISIBLE
    }
    
    // ... later ...
    
    if (!presult.all_visible)
    return presult.ndeleted; // Not taken if presult.all_visible=true
    
    new_vmbits = VISIBILITYMAP_ALL_VISIBLE; // Want to set this
    
    if (old_vmbits == new_vmbits) // Stale old_vmbits=ALL_VISIBLE,
    new_vmbits=ALL_VISIBLE
      return presult.ndeleted; // issue: early return
    
    After corruption repair clears the VM, old_vmbits is stale. The early
    return can fire unexpectedly, leaving the VM cleared when it should be
    re-set. Should we reset old_vmbits = 0 after the visibilitymap_clear?
    
    2) Add Assert(BufferIsDirty(buf))
    
    Since the patch's core claim is "buffer must be dirty before WAL
    registration", an assertion encodes this invariant. Should we add:
    
    Assert(BufferIsValid(buf));
    Assert(BufferIsDirty(buf));
    
    right before the visibilitymap_set() call?
    
    3) Comment about "only scenario"
    
    The comment at lines:
    > "The only scenario where it is not already dirty is if the VM was removed…"
    
    This phrasing could become misleading after future refactors. Can we
    make it more direct like:
    
    > "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
    
    4) Comment clarity
    
    Current comment:
    
    > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
    
    In this test we now call MarkBufferDirty() on the heap page even when
    only setting the VM, so the comments claiming “does not need to modify
    the heap buffer”/“no heap page modification” might be misleading. It
    might be better to say the test doesn’t need to modify heap
    tuples/page contents or doesn’t need to prune/freeze.
    
    --
    Best,
    Xuneng
    
    
    
    
  83. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-19T21:09:47Z

    Attached v29 addresses some feedback and also corrects a small error
    with the assertion I had added in the previous version's 0009.
    
    On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    >
    > I’ve done a basic review of patches 1 and 2. Here are some comments
    > which may be somewhat immature, as this is a fairly large change set
    > and I’m new to some parts of the code.
    >
    > 1) Potential stale old_vmbits after VM repair n v2
    
    Good catch! I've fixed this in attached v29.
    
    > 2) Add Assert(BufferIsDirty(buf))
    >
    > Since the patch's core claim is "buffer must be dirty before WAL
    > registration", an assertion encodes this invariant. Should we add:
    >
    > Assert(BufferIsValid(buf));
    > Assert(BufferIsDirty(buf));
    >
    > right before the visibilitymap_set() call?
    
    There are already assertions that will trip in various places -- most
    importantly in XLogRegisterBuffer(), which is the one that inspired
    this refactor.
    
    > The comment at lines:
    > > "The only scenario where it is not already dirty is if the VM was removed…"
    >
    > This phrasing could become misleading after future refactors. Can we
    > make it more direct like:
    >
    > > "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
    
    I see your point about future refactors missing updating comments like
    this. But, I don't think we are going to refactor the code such that
    we can have PD_ALL_VISIBLE set without the VM bits set more often.
    Also, it is common practice in Postgres to describe very specific edge
    cases or odd scenarios in order to explain code that may seem
    confusing without the comment. It does risk that comment later
    becoming stale, but it is better that future developers understand why
    the code is there.
    
    That being said, I take your point that the comment is confusing. I
    have updated it in a different way.
    
    > > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
    >
    > In this test we now call MarkBufferDirty() on the heap page even when
    > only setting the VM, so the comments claiming “does not need to modify
    > the heap buffer”/“no heap page modification” might be misleading. It
    > might be better to say the test doesn’t need to modify heap
    > tuples/page contents or doesn’t need to prune/freeze.
    
    The point I'm trying to make is that we have to dirty the buffer even
    if we don't modify the page because of the XLOG sub-system
    requirements. And, it may seem like a waste to do that if not
    modifying the page, but the page will rarely be clean anyway. I've
    tried to make this more clear in attached v29.
    
    - Melanie
    
  84. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-12-20T12:32:38Z

    On Sat, 20 Dec 2025 at 02:10, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > Attached v29 addresses some feedback and also corrects a small error
    > with the assertion I had added in the previous version's 0009.
    >
    > On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    > >
    > > I’ve done a basic review of patches 1 and 2. Here are some comments
    > > which may be somewhat immature, as this is a fairly large change set
    > > and I’m new to some parts of the code.
    > >
    > > 1) Potential stale old_vmbits after VM repair n v2
    >
    > Good catch! I've fixed this in attached v29.
    >
    > > 2) Add Assert(BufferIsDirty(buf))
    > >
    > > Since the patch's core claim is "buffer must be dirty before WAL
    > > registration", an assertion encodes this invariant. Should we add:
    > >
    > > Assert(BufferIsValid(buf));
    > > Assert(BufferIsDirty(buf));
    > >
    > > right before the visibilitymap_set() call?
    >
    > There are already assertions that will trip in various places -- most
    > importantly in XLogRegisterBuffer(), which is the one that inspired
    > this refactor.
    >
    > > The comment at lines:
    > > > "The only scenario where it is not already dirty is if the VM was removed…"
    > >
    > > This phrasing could become misleading after future refactors. Can we
    > > make it more direct like:
    > >
    > > > "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
    >
    > I see your point about future refactors missing updating comments like
    > this. But, I don't think we are going to refactor the code such that
    > we can have PD_ALL_VISIBLE set without the VM bits set more often.
    > Also, it is common practice in Postgres to describe very specific edge
    > cases or odd scenarios in order to explain code that may seem
    > confusing without the comment. It does risk that comment later
    > becoming stale, but it is better that future developers understand why
    > the code is there.
    >
    > That being said, I take your point that the comment is confusing. I
    > have updated it in a different way.
    >
    > > > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
    > >
    > > In this test we now call MarkBufferDirty() on the heap page even when
    > > only setting the VM, so the comments claiming “does not need to modify
    > > the heap buffer”/“no heap page modification” might be misleading. It
    > > might be better to say the test doesn’t need to modify heap
    > > tuples/page contents or doesn’t need to prune/freeze.
    >
    > The point I'm trying to make is that we have to dirty the buffer even
    > if we don't modify the page because of the XLOG sub-system
    > requirements. And, it may seem like a waste to do that if not
    > modifying the page, but the page will rarely be clean anyway. I've
    > tried to make this more clear in attached v29.
    >
    > - Melanie
    
    
    Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins
    of this code track down to fdf9e21196a6 which was committed as part of
    [0], at which point
    there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is
    the reason this optimization was not performed earlier.
    
    I also think this patch is correct, because we do similar things for
    HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and
    HeapTupleSatisfiesVacuumHorizon is just a proxy to
    HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS
    RECENTLY_DEAD handling.
    
    
    Similar change could be done at heapam_scan_analyze_next_tuple
    
    ...
    case HEAPTUPLE_DEAD:
    case HEAPTUPLE_RECENTLY_DEAD:
    /* Count dead and recently-dead rows */
    *deadrows += 1;
    break;
    ...
    
    
    
    [0] https://www.postgresql.org/message-id/CABOikdP0meGuXPPWuYrP%3DvDvoqUdshF2xJAzZHWSKg03Rz_%2B9Q%40mail.gmail.com
    
    
    -- 
    Best regards,
    Kirill Reshke
    
    
    
    
  85. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Chao Li <li.evan.chao@gmail.com> — 2025-12-22T07:19:39Z

    
    > On Dec 20, 2025, at 05:09, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > 
    > Attached v29 addresses some feedback and also corrects a small error
    > with the assertion I had added in the previous version's 0009.
    > 
    > On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
    >> 
    >> I’ve done a basic review of patches 1 and 2. Here are some comments
    >> which may be somewhat immature, as this is a fairly large change set
    >> and I’m new to some parts of the code.
    >> 
    >> 1) Potential stale old_vmbits after VM repair n v2
    > 
    > Good catch! I've fixed this in attached v29.
    > 
    >> 2) Add Assert(BufferIsDirty(buf))
    >> 
    >> Since the patch's core claim is "buffer must be dirty before WAL
    >> registration", an assertion encodes this invariant. Should we add:
    >> 
    >> Assert(BufferIsValid(buf));
    >> Assert(BufferIsDirty(buf));
    >> 
    >> right before the visibilitymap_set() call?
    > 
    > There are already assertions that will trip in various places -- most
    > importantly in XLogRegisterBuffer(), which is the one that inspired
    > this refactor.
    > 
    >> The comment at lines:
    >>> "The only scenario where it is not already dirty is if the VM was removed…"
    >> 
    >> This phrasing could become misleading after future refactors. Can we
    >> make it more direct like:
    >> 
    >>> "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer() requires it."
    > 
    > I see your point about future refactors missing updating comments like
    > this. But, I don't think we are going to refactor the code such that
    > we can have PD_ALL_VISIBLE set without the VM bits set more often.
    > Also, it is common practice in Postgres to describe very specific edge
    > cases or odd scenarios in order to explain code that may seem
    > confusing without the comment. It does risk that comment later
    > becoming stale, but it is better that future developers understand why
    > the code is there.
    > 
    > That being said, I take your point that the comment is confusing. I
    > have updated it in a different way.
    > 
    >>> "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as it must be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM was removed..."
    >> 
    >> In this test we now call MarkBufferDirty() on the heap page even when
    >> only setting the VM, so the comments claiming “does not need to modify
    >> the heap buffer”/“no heap page modification” might be misleading. It
    >> might be better to say the test doesn’t need to modify heap
    >> tuples/page contents or doesn’t need to prune/freeze.
    > 
    > The point I'm trying to make is that we have to dirty the buffer even
    > if we don't modify the page because of the XLOG sub-system
    > requirements. And, it may seem like a waste to do that if not
    > modifying the page, but the page will rarely be clean anyway. I've
    > tried to make this more clear in attached v29.
    > 
    > - Melanie
    > <v29-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v29-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch><v29-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch><v29-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch><v29-0005-Move-VM-assert-into-prune-freeze-code.patch><v29-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v29-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v29-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v29-0009-Simplify-heap_page_would_be_all_visible-visibili.patch><v29-0010-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v29-0011-Unset-all_visible-sooner-if-not-freezing.patch><v29-0012-Track-which-relations-are-modified-by-a-query.patch><v29-0013-Pass-down-information-on-table-modification-to-s.patch><v29-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch><v29-0015-Set-pd_prune_xid-on-insert.patch>
    
    A few more comments on v29:
    
    1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
    
    2 - 0003
    ```
    + * Helper to correct any corruption detected on an heap page and its
    ```
    
    Nit: “an” -> “a”
    
    3 - 0003
    ```
    +static bool
    +identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
    +							   BlockNumber heap_blk, Page heap_page,
    +							   int nlpdead_items,
    +							   Buffer vmbuffer,
    +							   uint8 vmbits)
    +{
    +	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
    ```
    
    Right before this function is called:
    ```
     	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
    +	if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
    +									   presult.lpdead_items, vmbuffer,
    +									   old_vmbits))
    ```
    
    So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.
    
    4 - 0004
    ```
    +	 * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
    +	 * we have attempted to update the VM.
    +	 */
    +	uint8		new_vmbits;
    +	uint8		old_vmbits;
    ```
    
    The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code:
    ```
    +	prstate->attempt_update_vm =
    +		(params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
    ```
    
    It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.
    
    5 - 0004
    ```
    + * Returns true if one or both VM bits should be set, along with returning the
    + * current value of the VM bits in *old_vmbits and the desired new value of
    + * the VM bits in *new_vmbits.
    + */
    +static bool
    +heap_page_will_set_vm(PruneState *prstate,
    +					  Relation relation,
    +					  BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
    +					  Buffer vmbuffer,
    +					  int nlpdead_items,
    +					  uint8 *old_vmbits,
    +					  uint8 *new_vmbits)
    +{
    +	if (!prstate->attempt_update_vm)
    +		return false;
    ```
    
    old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.
    
    I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().
    
    6 - 0004
    ```
    @@ -823,13 +975,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
     						   MultiXactId *new_relmin_mxid)
     {
     	Buffer		buffer = params->buffer;
    +	Buffer		vmbuffer = params->vmbuffer;
     	Page		page = BufferGetPage(buffer);
    +	BlockNumber blockno = BufferGetBlockNumber(buffer);
     	PruneState	prstate;
     	bool		do_freeze;
     	bool		do_prune;
     	bool		do_hint_prune;
    +	bool		do_set_vm;
     	bool		did_tuple_hint_fpi;
     	int64		fpi_before = pgWalUsage.wal_fpi;
    +	uint8		new_vmbits = 0;
    +	uint8		old_vmbits = 0;
    +
     
     	/* Initialize prstate */
    ```
    
    Nit: an extra empty line is added.
    
    7 - 0005
    ```
    -	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
    -	 * will return 'all_visible', 'all_frozen' flags to the caller.
    +	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
    ```
    
    Nit: a tailing dot is needed in the end of the comment line.
    
    8 - 0005
    ```
    @@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
     	Buffer		vmbuffer = params->vmbuffer;
     	Page		page = BufferGetPage(buffer);
     	BlockNumber blockno = BufferGetBlockNumber(buffer);
    +	TransactionId vm_conflict_horizon = InvalidTransactionId;
    ```
    
    I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.
    
    9 - 0006
    ```
    @@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
     	{
     		ItemId		itemid;
     		HeapTupleData tuple;
    +		TransactionId dead_after = InvalidTransactionId;
    ```
    
    This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
    
    10 - 0010
    ```
    +				 * there is any snapshot that still consider the newest xid on
    ```
    
    Nit: consider -> considers
    
    11 - 0011
    ```
    +	 * page. If we won't attempt freezing, just unset all-visible now, though.
     	 */
    +	if (!prstate->attempt_freeze)
    +	{
    +		prstate->all_visible = false;
    +		prstate->all_frozen = false;
    +	}
    ```
    
    The comment says “just unset all-visible”, but the code actually also unset all_frozen.
    
    12 - 0012
    ```
    +	/*
    +	 * RT indexes of relations modified by the query either through
    +	 * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
    +	 */
    +	Bitmapset  *es_modified_relids;
    ```
    
    As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.
    
    13 - 0012
    ```
    +			/* If it has a rowmark, the relation is modified */
    +			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
    +														rc->rti);
    ```
    
    I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.
    
    14 - 0015 - commit message
    ```
    Setting pd_prune_xid on insert can cause a page to be dirtied and
    written out when it previously would not have been, affetcting the
    ```
    
    Typo: affetcting -> affecting
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  86. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-22T17:57:16Z

    On Mon, Dec 22, 2025 at 2:20 AM Chao Li <li.evan.chao@gmail.com> wrote:
    >
    > A few more comments on v29:
    
    Thanks for the continued review! I've attached v30.
    
    > 1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
    
    I was torn about whether or not to change the return value. Coverity
    doesn't always warn about unused return values. Usually it warns if it
    perceives the return value as needed for error checking or if it
    thinks not using the return value is incorrect. It may still warn in
    this case, but it's not obvious to me which way it would go.
    
    I have changed the function signature as you suggested in v30.
    
    My hesitation is that visibilitymap_set() is in a header file and
    could be used by extensions/forks, etc. Adding more information by
    changing a return value from void to non-void doesn't have any
    negative effect on those potential callers. But taking away a return
    value is more likely to affect them in a potentially negative way.
    
    However, I'm significantly changing the signature in this release, so
    everybody that used it will have to change their code completely
    anyway. Also, I just added a return value for visibilitymap_set() in
    the previous release (18). Historically, it returned void. So, I've
    gone with your suggestion.
    
    > +static bool
    > +identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
    > +                                                          BlockNumber heap_blk, Page heap_page,
    > +                                                          int nlpdead_items,
    > +                                                          Buffer vmbuffer,
    > +                                                          uint8 vmbits)
    > +{
    > +       Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
    > ```
    >
    > Right before this function is called:
    > ```
    >         old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
    > +       if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
    > +                                                                          presult.lpdead_items, vmbuffer,
    > +                                                                          old_vmbits))
    > ```
    >
    > So, the Assert() is checking if old_vmbits is newly returned from visibilitymap_get_status(), in that case, identify_and_fix_vm_corruption() can take vmbits as a pointer , and it calls visibilitymap_get_status() to get vmbits itself and returns vmbits via the pointer, so that we don’t need to call visibilitymap_get_status() twice.
    
    I see what you are saying, and I did consider this.
    visibilitymap_get_status() is only called the second time in assert
    builds, and it isn't so expensive to do it that it is worth worrying
    about.  I added the assertion to prevent other callers from calling
    identify_and_fix_vm_corruption() with random VM bits unassociated with
    the vmbuffer passed in.
    
    The reason I don't think identify_and_fix_vm_corruption() should be
    the one to call visibilitymap_get_status() and initialize old_vmbits
    is that it shouldn't be a required step to setting the VM.
    identify_and_fix_vm_corruption()'s job is to identify and fix
    corruption -- not get the VM bits for when we set them. In fact, it
    may make sense someday to check that the VM and PD_ALL_VISIBLE are in
    sync before pruning and freezing is even started. (Of course, we can't
    check the number of lpdead items until after).
    
    Regarding having *old_vmbits as a return value. I thought about
    directly returning the result of visibilitymap_clear() from
    identify_and_fix_vm_corruption(). The reason I didn't is that if
    PD_ALL_VISIBLE is set and nlpdead_items > 0 but the VM is clear,
    visibilitymap_clear() will return false -- because it didn't need to
    clear the VM bits. And I think we want
    identify_and_fix_vm_corruption() to return true if it cleared
    corruption at all.
    
    I don't think we should have identify_and_fix_vm_corruption() reset
    old_vmbits to 0 (and pass it by reference), because the caller may
    want to know the value of old_vmbits before we cleared corruption.
    
    > +        * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set and
    > +        * we have attempted to update the VM.
    > +        */
    > +       uint8           new_vmbits;
    > +       uint8           old_vmbits;
    > ```
    >
    > The comment feels a little confusing to me. "HEAP_PAGE_PRUNE_UPDATE_VM option is set” is a clear indication, but how to decide "we have attempted to update the VM”? By reading the code:
    > ```
    > +       prstate->attempt_update_vm =
    > +               (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
    >
    > It’s just the result of HEAP_PAGE_PRUNE_UPDATE_VM being set. So, maybe we don’t the “and” part.
    
    Good point. Fixed.
    
    > +static bool
    > +heap_page_will_set_vm(PruneState *prstate,
    > +                                         Relation relation,
    > +                                         BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
    > +                                         Buffer vmbuffer,
    > +                                         int nlpdead_items,
    > +                                         uint8 *old_vmbits,
    > +                                         uint8 *new_vmbits)
    > +{
    > +       if (!prstate->attempt_update_vm)
    > +               return false;
    > ```
    >
    > old_vmbits and new_vmbits are purely output parameters. So, maybe we should set them to 0 inside this function instead of relying on callers to initialize them.
    >
    > I think this is a similar case where I raised a comment earlier about initializing presult to {0} in the callers, and you only wanted to set presult in heap_page_prune_and_freeze().
    
    I see your point. It does feel a little bit different to me since they
    are local variables and coverity may not actually be able to tell they
    are being unconditionally initialized by heap_page_will_set_vm(). The
    other local variables that are not initialized at the top are all
    unconditionally set by helper return values. But my decision to
    initialize them was more instinct than rationality. I've changed it as
    you suggested.
    
    > -        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
    > -        * will return 'all_visible', 'all_frozen' flags to the caller.
    > +        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
    >
    > Nit: a tailing dot is needed in the end of the comment line.
    
    I've changed it. One interesting thing is that our "policy" for
    periods in comments is that we don't put periods at the end of
    one-line comments and we do put them at the end of mult-line comment
    sentences. This is a one-line comment inside a comment block, so I
    wasn't sure what to do. If you noticed it, and it bothered you, it's
    easy enough to change, though.
    
    > @@ -978,6 +1003,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
    >         Buffer          vmbuffer = params->vmbuffer;
    >         Page            page = BufferGetPage(buffer);
    >         BlockNumber blockno = BufferGetBlockNumber(buffer);
    > +       TransactionId vm_conflict_horizon = InvalidTransactionId;
    > ```
    >
    > I guess the variable name “vm_conflict_horizon” comes from the old "presult->vm_conflict_horizon”. But in the new logic, this variable is used more generic, for example Assert(debug_cutoff == vm_conflict_horizon). I see 0006 has renamed to “conflict_xid”, so it’s up to you if or not rename it. But to make the commit self-contained, I’d suggest renaming it.
    
    As of this patch, it is still being exclusively used as the conflict
    XID for setting the visibility map. And it still is the visibility
    horizon. I rename it to conflict xid once it includes more than just
    the visibility horizon for an all-visible page. In that assertion, it
    is also the visibility horizon for an all-visible page.
    
    > 9 - 0006
    >
    > @@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
    >         {
    >                 ItemId          itemid;
    >                 HeapTupleData tuple;
    > +               TransactionId dead_after = InvalidTransactionId;
    > ```
    >
    > This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
    
    I think this is a comment for a later patch in the set (you originally
    said it was from 0006), but I've changed dead_after to not be
    initialized like this.
    
    > +       /*
    > +        * RT indexes of relations modified by the query either through
    > +        * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
    > +        */
    > +       Bitmapset  *es_modified_relids;
    > ```
    >
    > As we intentionally only want indexes, does it make sense to just name the field es_modified_rtindexes to make it more explicit.
    
    I'm torn about this. I named it like this partially because the struct
    member two above it in the estate, es_unpruned_relids, is also a
    bitmapset of range table indexes and yet is called x_relids. Though
    the bitmapset is one of indexes into the range table, they are the
    indexes of relation IDs in that range table. I think this could go
    either way, so I've left it as is for now and will think more about it
    once this patch is closer to being committed.
    
    > +                       /* If it has a rowmark, the relation is modified */
    > +                       estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
    > +                                                                                                               rc->rti);
    > ```
    >
    > I think this comment is a little misleading, because SELECT FOR UPDATE/SHARE doesn’t always modify tuples of the relation. If a reader not associating this code with this patch, he may consider the comment is wrong. So, I think we should make the comment more explicit. Maybe rephrase like “If it has a rowmark, the relation may modify or lock heap pages”.
    
    I see what you are saying. It's a good point. However, the reason we
    don't want to set the VM for SELECT FOR UPDATE is not because the
    SELECT FOR UPDATE will lock the relation but because it is usually
    indicating that we intend to modify the relation (when we do the
    update). As such, I've updated the comment to say "If it has a
    rowmark, the relation may be modified" -- which leaves it more open.
    
    - Melanie
    
  87. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-22T18:20:59Z

    On Sat, Dec 20, 2025 at 7:32 AM Kirill Reshke <reshkekirill@gmail.com> wrote:
    >
    > Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins
    > of this code track down to fdf9e21196a6 which was committed as part of
    > [0], at which point
    > there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is
    > the reason this optimization was not performed earlier.
    
    Thanks for taking a look into this!
    
    > I also think this patch is correct, because we do similar things for
    > HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and
    > HeapTupleSatisfiesVacuumHorizon is just a proxy to
    > HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS
    > RECENTLY_DEAD handling.
    >
    > Similar change could be done at heapam_scan_analyze_next_tuple
    >
    > ...
    > case HEAPTUPLE_DEAD:
    > case HEAPTUPLE_RECENTLY_DEAD:
    > /* Count dead and recently-dead rows */
    > *deadrows += 1;
    > break;
    
    In v30 sent here [1], I did end up making this change in 0010. I just
    realized that I should have also changed
    table_scan_analyze_next_tuple() and removed the call to
    GetOldestRemovableTransactionId(). I've done that in attached v31.
    
    I'm not sure we should change the table AM API (by removing
    OldestXmin), though. I looked for table AMs implementing
    scan_analyze_next_tuple() to see if they use OldestXmin. I found two:
    OrioleDB [2] and Citus columnar [3], which both implement
    scan_analyze_next_tuple() and neither of them use OldestXmin. I
    couldn't easily find other table AMs implementing
    scan_analyze_next_tuple(). I don't have a strong sense of whether or
    not I should make this change. Changing it is churn to a public API
    and doesn't specifically enable us to do something.
    
    I could also just leave it unused by heapam's implementation. I
    haven't checked what, if any, other table AMs callbacks have
    parameters completely unused by their heap implementation.
    
    So, I'm on the fence about whether or not to make the change at all,
    and, if I do, whether or not to change the table AM callback. That is
    done in v31, though, so we can discuss.
    
    - Melanie
    
    [1] https://www.postgresql.org/message-id/CAAKRu_ZCjHoRPfQ8AbMrFY8TOMCPAvZ0_m9SX7yg0edfTk45-g%40mail.gmail.com
    [2] https://github.com/orioledb/orioledb/blob/acff65984d106dabf708a179e2c6694297e08c02/src/tableam/handler.c#L978C68-L978C78
    [3] https://github.com/citusdata/citus/blob/ee3812d267db3ab007efb6f5f432c82c1f448695/src/backend/columnar/columnar_tableam.c#L1418
    
  88. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Chao Li <li.evan.chao@gmail.com> — 2025-12-23T00:00:57Z

    
    > On Dec 23, 2025, at 01:57, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > 
    > On Mon, Dec 22, 2025 at 2:20 AM Chao Li <li.evan.chao@gmail.com> wrote:
    >> 
    >> A few more comments on v29:
    > 
    > Thanks for the continued review! I've attached v30.
    > 
    >> 1 - 0002 - Looks like since 0002, visibilitymap_set()’s return value is no longer used, so do we need to update the function and change return type to void? I remember in some patches, to address Coverity alerts, people had to do “(void) function_with_a_return_value()”.
    > 
    > I was torn about whether or not to change the return value. Coverity
    > doesn't always warn about unused return values. Usually it warns if it
    > perceives the return value as needed for error checking or if it
    > thinks not using the return value is incorrect. It may still warn in
    > this case, but it's not obvious to me which way it would go.
    > 
    > I have changed the function signature as you suggested in v30.
    > 
    > My hesitation is that visibilitymap_set() is in a header file and
    > could be used by extensions/forks, etc. Adding more information by
    > changing a return value from void to non-void doesn't have any
    > negative effect on those potential callers. But taking away a return
    > value is more likely to affect them in a potentially negative way.
    > 
    > However, I'm significantly changing the signature in this release, so
    > everybody that used it will have to change their code completely
    > anyway. Also, I just added a return value for visibilitymap_set() in
    > the previous release (18). Historically, it returned void. So, I've
    > gone with your suggestion.
    
    From a previous patch, I learned from Peter Eisentraut that “We don't care about ABI changes in major releases.”, see:
    
    https://www.postgresql.org/message-id/70913dbd-dadf-4560-9f81-c0df72bf6578%40eisentraut.org
    
    >> -        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
    >> -        * will return 'all_visible', 'all_frozen' flags to the caller.
    >> +        * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples
    >> 
    >> Nit: a tailing dot is needed in the end of the comment line.
    > 
    > I've changed it. One interesting thing is that our "policy" for
    > periods in comments is that we don't put periods at the end of
    > one-line comments and we do put them at the end of mult-line comment
    > sentences. This is a one-line comment inside a comment block, so I
    > wasn't sure what to do. If you noticed it, and it bothered you, it's
    > easy enough to change, though.
    
    If this is a one-line comment, I would have not been caring about the tailing period.
    
    The problem is this is a paragraph of a block comment, and the above and below paragraphs all have tailing periods. So, for consistency, I raised the comment.
    ```
     	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
     	 * LP_UNUSED during pruning.   <=== Has a tailing period
     	 *
    -	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
    -	 * will return 'all_visible', 'all_frozen' flags to the caller.
    +	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples <=== Not a tailing period
     	 *
     	 * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
     	 * in the VM.                                 <=== Has a tailing period
    ```
    
    > 
    >> 9 - 0006
    >> 
    >> @@ -3537,6 +3537,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
    >>        {
    >>                ItemId          itemid;
    >>                HeapTupleData tuple;
    >> +               TransactionId dead_after = InvalidTransactionId;
    >> ```
    >> 
    >> This initialization seems to not needed, as HeapTupleSatisfiesVacuumHorizon() will always set a value to it.
    > 
    > I think this is a comment for a later patch in the set (you originally
    > said it was from 0006), but I've changed dead_after to not be
    > initialized like this.
    
    My bad. This comment was actually for 0009. In v31, I see you have removed the initialization to dead_after.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
    
    
    
  89. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Melanie Plageman <melanieplageman@gmail.com> — 2025-12-23T01:18:05Z

    On Mon, Dec 22, 2025 at 7:01 PM Chao Li <li.evan.chao@gmail.com> wrote:
    >
    > > On Dec 23, 2025, at 01:57, Melanie Plageman <melanieplageman@gmail.com> wrote:
    > >
    > > My hesitation is that visibilitymap_set() is in a header file and
    > > could be used by extensions/forks, etc. Adding more information by
    > > changing a return value from void to non-void doesn't have any
    > > negative effect on those potential callers. But taking away a return
    > > value is more likely to affect them in a potentially negative way.
    > >
    > > However, I'm significantly changing the signature in this release, so
    > > everybody that used it will have to change their code completely
    > > anyway. Also, I just added a return value for visibilitymap_set() in
    > > the previous release (18). Historically, it returned void. So, I've
    > > gone with your suggestion.
    >
    > From a previous patch, I learned from Peter Eisentraut that “We don't care about ABI changes in major releases.”, see:
    
    Right, it is totally okay to change function APIs in a major release.
    My point was not that it wasn't allowed but that if people are getting
    useful information returned from that function, or if we think we
    might want that information again in the future, we should think twice
    before changing it. But, in this case, I think we don't need to worry
    about it.
    
    - Melanie