Thread

  1. Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

    Kirill Reshke <reshkekirill@gmail.com> — 2025-12-18T20:31:27Z

    On Fri, 19 Dec 2025 at 00:58, Melanie Plageman
    <melanieplageman@gmail.com> wrote:
    >
    > On Thu, Dec 18, 2025 at 1:07 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
    > >
    > > On Thu, 18 Dec 2025 at 20:18, Melanie Plageman
    > > <melanieplageman@gmail.com> wrote:
    > >
    > > > But you are right, I don't see any non-error code path where a heap
    > > > page would become empty (all line pointers set unused) and then not be
    > > > set all-visible. Only vacuum sets line pointers unused and if all the
    > > > line pointers are unused it will always set the page all-visible.
    > > >
    > > > I think, though, that if we error out in lazy_scan_prune() after
    > > > returning from heap_page_prune_and_freeze() such that we don't set the
    > > > empty page all-visible, we can end up with an empty page without
    > > > PD_ALL_VISIBLE set. You can see how this might work by patching the VM
    > > > set code in lazy_scan_prune() to skip empty pages.
    > >
    > > Thank you for your explanation!  I completely forgot that PD_ALL_VIS
    > > is a non-persistent change (hint bit). so its update can be trivially
    > > lost.
    > > The simplest real-life example is being killed just after returning
    > > from heap_page_prune_and_freeze, yes.
    > > PFA tap test covering lazy_scan_new_or_empty code path for
    > > empty-but-not-all-visible page
    >
    > Cool test! I'm going to have to think more about whether or not it is
    > worth adding a whole new TAP test for this codepath. Is there an
    > existing TAP test we could add it to so we don't need to make a new
    > cluster, etc? How long does the test take to run? Obviously it will be
    > quite short, but every bit we add to the test suite counts. I don't
    > actually know how much overhead there is with injection points.
    >
    
    Well, on my pc this test runs in ~1.5 sec. I did not find any other
    TAP test to place this, so created a new.
    Actually, I only check for specific patterns in the log file of the
    cluster in this test, so this test can instead be a regression test.
    
    ```
    reshke=# VACUUM (DISABLE_PAGE_SKIPPING) vac_empty_test;
    NOTICE:  notice triggered for injection point vacuum-empty-page-non-all-vis
    VACUUM
    reshke=#
    ```
    We will just check in the .out file that the code hits
    'vacuum-empty-page-non-all-vis' after an error.
    injection points overhead should not be that awful, just from my
    experience. Maybe buildfarm members can say something here, I dunno.
    
    Also, we already have a bunch of regression+inj point tests for some
    rare cases, exempli gratia
    src/test/modules/nbtree/sql/nbtree_half_dead_pages.sql.
    
    -- 
    Best regards,
    Kirill Reshke