Thread

  1. Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

    Nitin Jadhav <nitinjadhavpostgres@gmail.com> — 2024-10-07T08:36:54Z

    Hi,
    
    It’s been a long gap in the activity of this thread, and I apologize
    for the delay. However, I have now returned and reviewed the other
    threads [1],[2] that have made changes in this area. I would like to
    share a summary of the discussion that took place among Robert,
    Andres, Bharath, and Tom on this thread, to make it easier to move
    forward.
    
    Robert was dissatisfied with the approach used in the patch to report
    progress for the checkpointer process, as he felt the current
    mechanism is not suitable. He proposed allocating a dedicated chunk of
    shared memory in CheckpointerShmemStruct. Bharath opposed this,
    suggesting instead to use PgStat_CheckpointerStats. Andres somewhat
    supported Robert’s idea but noted that using PgStat_CheckpointerStats
    would allow for more code reuse.
    
    The discussion then shifted towards the statistics handling for the
    checkpointer process. Robert expressed dissatisfaction with the
    current statistics handling mechanism. Andres explained the rationale
    behind the existing setup and the improvements made in pg_stat_io. He
    also mentioned the overhead of the changecount mechanism when updating
    for every buffer write. However, for updates involving a single
    parameter at a time, performance can be improved on platforms that
    support atomic 64-bit writes (indicated by #ifdef
    PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY). He also shared performance
    metrics demonstrating good results with this approach. Tom agreed to
    use this and restrict it to the specific case.
    
    But I am not quite clear on the direction ahead. Let me summarise the
    approaches based on the above discussion.
    
    Approach-1: The approach used in the current patch which uses the
    existing mechanism of progress reporting. The advantage of this
    approach is that the machinery is already in place and ready to use.
    However, it is not suitable for the checkpointer process because only
    the checkpointer process runs the checkpoint, even if the command is
    issued from a different backend. The current mechanism is designed for
    any backend to report progress for any command it is running, and we
    don’t know which command that will be at any given point in time, or
    how many backends will be running any given command simultaneously.
    Hence, this approach does not fit the checkpointer. Additionally,
    there is complexity involved in mapping down to integers and back.
    
    Approach-2: Allocate a dedicated chunk of shared memory in
    CheckpointerShmemStruct with an appropriate name and size. This
    approach eliminates the complexity involved in Approach-1 related to
    mapping down to integers and back. However, it requires building the
    necessary machinery to suit checkpointer progress reporting which
    might be similar to the current progress reporting mechanism.
    
    Approach-3: Using PgStat_CheckpointerStats to store the progress
    information. Have we completely ruled out this approach?
    
    Additionally all three approaches require improvements in the
    changecount mechanism on platforms that support atomic 64-bit writes.
    
    I’m inclined to favor Approach-2 because it provides a clearer method
    for reporting progress for the checkpointer process, with the
    additional work required to implement the necessary machinery.
    However, I’m still uncertain about the best path forward. Please share
    your thoughts.
    
    [1]: https://www.postgresql.org/message-id/flat/CAOtHd0ApHna7_p6mvHoO%2BgLZdxjaQPRemg3_o0a4ytCPijLytQ%40mail.gmail.com#74ae447064932198495aa6d722fdc092
    [2]: https://www.postgresql.org/message-id/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
    
    Best Regards,
    Nitin Jadhav
    Azure Database for PostgreSQL
    Microsoft
    
    On Tue, Jan 31, 2023 at 11:16 PM vignesh C <vignesh21@gmail.com> wrote:
    >
    > On Thu, 8 Dec 2022 at 00:33, Andres Freund <andres@anarazel.de> wrote:
    > >
    > > Hi,
    > >
    > > On 2022-11-04 09:25:52 +0100, Drouvot, Bertrand wrote:
    > > > Please find attached a rebase in v7.
    > >
    > > cfbot complains that the docs don't build:
    > > https://cirrus-ci.com/task/6694349031866368?logs=docs_build#L296
    > >
    > > [03:24:27.317] ref/checkpoint.sgml:66: element para: validity error : Element para is not declared in para list of possible children
    > >
    > > I've marked the patch as waitin-on-author for now.
    > >
    > >
    > > There's been a bunch of architectural feedback too, but tbh, I don't know if
    > > we came to any conclusion on that front...
    >
    > There has been no updates on this thread for some time, so this has
    > been switched as Returned with Feedback. Feel free to open it in the
    > next commitfest if you plan to continue on this.
    >
    > Regards,
    > Vignesh