Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. injection_points: Add proper locking when reporting fixed-variable stats

  1. Fix locking issue with fixed-size stats template in injection_points

    Michael Paquier <michael@paquier.xyz> — 2025-09-29T00:48:44Z

    Hi all,
    
    While doing some concurrency benchmarking with injection_points stats
    enabled in a server, I have been able to trigger an assertion failure
    in pgstat_begin_changecount_write():
    #4  0x0000564917fdc816 in ExceptionalCondition
    (conditionName=0x7f60af0dc5c8 "(*cc & 1) == 0",
    fileName=0x7f60af0dc598
    "../../../../src/include/utils/pgstat_internal.h",
    lineNumber=831) at assert.c:65
    #5  0x00007f60af0da3ef in pgstat_begin_changecount_write
    (cc=0x7f60acbb8e10) at
    ../../../../src/include/utils/pgstat_internal.h:831
    #6  0x00007f60af0db1d9 in pgstat_report_inj_fixed (numattach=0,
    numdetach=0, numrun=1, numcached=0, numloaded=0) at
    injection_stats_fixed.c:155
    #7  0x00007f60af0d8b5c in injection_points_run (fcinfo=0x564931b66588)
    at injection_points.c:429
    
    This can be reproduced as follows.  First, postgresql.conf:
    shared_preload_libraries = 'injection_points'
    injection_points.stats = on
    
    Then something like the following command:
    $ cat create_inj.sql
    \set id random(1,100000)
    select injection_points_attach('popo:id', 'notice');
    select injection_points_run('popo:id');
    select injection_points_detach('popo:id');
    $ pgbench -n -T 300 -f create_inj.sql -c 10
    
    The failure is not surprising, because the stats reports can happen in
    a concurrent fashion when a point is run for example, contrary to
    other fixed-sized stats kind where the reports are only done by a
    single process (archiver, bgwriter, checkpointer).  So this is just a
    matter of acquiring a lock that was forgotten, to make sure that the
    changes are consistent.  Far from critical as this is template code,
    still embarrassing.
    
    Thoughts or comments?
    --
    Michael
    
  2. Re: Fix locking issue with fixed-size stats template in injection_points

    Chao Li <li.evan.chao@gmail.com> — 2025-09-29T01:46:05Z

    
    > On Sep 29, 2025, at 08:48, Michael Paquier <michael@paquier.xyz> wrote:
    > 
    > 
    > The failure is not surprising, because the stats reports can happen in
    > a concurrent fashion when a point is run for example, contrary to
    > other fixed-sized stats kind where the reports are only done by a
    > single process (archiver, bgwriter, checkpointer).  So this is just a
    > matter of acquiring a lock that was forgotten, to make sure that the
    > changes are consistent.  Far from critical as this is template code,
    > still embarrassing.
    > 
    > Thoughts or comments?
    
    
    I saw pg_state_begin_changecount_write() is called multiple places, as you mention, for example bgwriter. But there are not the same lock acquired in other places, for example, in bgwriter:
    
    void
    pgstat_report_bgwriter(void)
    {
        PgStatShared_BgWriter *stats_shmem = &pgStatLocal.shmem->bgwriter;
    
        Assert(!pgStatLocal.shmem->is_shutdown);
        pgstat_assert_is_up();
    
        /*
         * This function can be called even if nothing at all has happened. In
         * this case, avoid unnecessarily modifying the stats entry.
         */
        if (pg_memory_is_all_zeros(&PendingBgWriterStats,
                                   sizeof(struct PgStat_BgWriterStats)))
            return;
    
        pgstat_begin_changecount_write(&stats_shmem->changecount);
    
    #define BGWRITER_ACC(fld) stats_shmem->stats.fld += PendingBgWriterStats.fld
        BGWRITER_ACC(buf_written_clean);
        BGWRITER_ACC(maxwritten_clean);
        BGWRITER_ACC(buf_alloc);
    #undef BGWRITER_ACC
    
        pgstat_end_changecount_write(&stats_shmem->changecount);
    
    Only adding the lock in pg_report_inj_fixed() won’t prevent the race conditions from bgwriter. So I wonder, do we need to add the same lock in the other places?
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
  3. Re: Fix locking issue with fixed-size stats template in injection_points

    Michael Paquier <michael@paquier.xyz> — 2025-09-29T06:27:56Z

    On Mon, Sep 29, 2025 at 09:46:05AM +0800, Chao Li wrote:
    > I saw pg_state_begin_changecount_write() is called multiple places,
    > as you mention, for example bgwriter.
    
    I've mentioned that in my first email, and put in details:
    - pgstat_report_bgwriter() is called once, by the bgwriter.
    - pgstat_report_checkpointer() is called three time, all by the
    checkpointer.
    - pgstat_report_archiver() is called twice, all by pgarch.c.
    
    So all of them don't have a problem, two calls cannot happen
    concurrently.
    --
    Michael
    
  4. Re: Fix locking issue with fixed-size stats template in injection_points

    Bertrand Drouvot <bertranddrouvot.pg@gmail.com> — 2025-09-29T06:52:31Z

    Hi,
    
    On Mon, Sep 29, 2025 at 09:48:44AM +0900, Michael Paquier wrote:
    > Then something like the following command:
    > $ cat create_inj.sql
    > \set id random(1,100000)
    > select injection_points_attach('popo:id', 'notice');
    > select injection_points_run('popo:id');
    > select injection_points_detach('popo:id');
    > $ pgbench -n -T 300 -f create_inj.sql -c 10
    > 
    > The failure is not surprising, because the stats reports can happen in
    > a concurrent fashion when a point is run for example, contrary to
    > other fixed-sized stats kind where the reports are only done by a
    > single process (archiver, bgwriter, checkpointer).  So this is just a
    > matter of acquiring a lock that was forgotten, to make sure that the
    > changes are consistent.  Far from critical as this is template code,
    > still embarrassing.
    > 
    > Thoughts or comments?
    
    Patch LGTM.
    
    Remark: I like the "popo" prefix in your test ;-)
    
    Regards,
    
    -- 
    Bertrand Drouvot
    PostgreSQL Contributors Team
    RDS Open Source Databases
    Amazon Web Services: https://aws.amazon.com
    
    
    
    
  5. Re: Fix locking issue with fixed-size stats template in injection_points

    Chao Li <li.evan.chao@gmail.com> — 2025-09-29T07:29:25Z

    
    > On Sep 29, 2025, at 14:27, Michael Paquier <michael@paquier.xyz> wrote:
    > 
    > On Mon, Sep 29, 2025 at 09:46:05AM +0800, Chao Li wrote:
    >> I saw pg_state_begin_changecount_write() is called multiple places,
    >> as you mention, for example bgwriter.
    > 
    > I've mentioned that in my first email, and put in details:
    > - pgstat_report_bgwriter() is called once, by the bgwriter.
    > - pgstat_report_checkpointer() is called three time, all by the
    > checkpointer.
    > - pgstat_report_archiver() is called twice, all by pgarch.c.
    > 
    > So all of them don't have a problem, two calls cannot happen
    > concurrently.
    > --
    > Michael
    
    
    Thanks for the clarification. Then the patch looks good to me.
    
    Best regards,
    --
    Chao Li (Evan)
    HighGo Software Co., Ltd.
    https://www.highgo.com/
    
    
    
    
    
  6. Re: Fix locking issue with fixed-size stats template in injection_points

    wenhui qiu <qiuwenhuifx@gmail.com> — 2025-09-29T07:54:39Z

    HI
    This patch looks good to me.
    
    The issue is clear: unlike other fixed-size stats kinds (archiver,
    bgwriter, checkpointer), the injection_points stats can be updated
    concurrently by multiple backends. Without synchronization, this can lead
    to inconsistent changecount state and assertion failures in
    pgstat_begin_changecount_write(), as shown in your reproduction.
    
    
    
    Thanks
    
    On Mon, Sep 29, 2025 at 3:29 PM Chao Li <li.evan.chao@gmail.com> wrote:
    
    >
    >
    > On Sep 29, 2025, at 14:27, Michael Paquier <michael@paquier.xyz> wrote:
    >
    > On Mon, Sep 29, 2025 at 09:46:05AM +0800, Chao Li wrote:
    >
    > I saw pg_state_begin_changecount_write() is called multiple places,
    > as you mention, for example bgwriter.
    >
    >
    > I've mentioned that in my first email, and put in details:
    > - pgstat_report_bgwriter() is called once, by the bgwriter.
    > - pgstat_report_checkpointer() is called three time, all by the
    > checkpointer.
    > - pgstat_report_archiver() is called twice, all by pgarch.c.
    >
    > So all of them don't have a problem, two calls cannot happen
    > concurrently.
    > --
    > Michael
    >
    >
    > Thanks for the clarification. Then the patch looks good to me.
    >
    > Best regards,
    > --
    > Chao Li (Evan)
    > HighGo Software Co., Ltd.
    > https://www.highgo.com/
    >
    >
    >
    >
    >
    
  7. Re: Fix locking issue with fixed-size stats template in injection_points

    Michael Paquier <michael@paquier.xyz> — 2025-09-29T08:01:37Z

    On Mon, Sep 29, 2025 at 06:52:31AM +0000, Bertrand Drouvot wrote:
    > Remark: I like the "popo" prefix in your test ;-)
    
    It's that or popopop.  "glop" and "pas-glop" were my other candidates,
    but I doubt we can use them freely, and you may be the only one around
    here to know what this refers to.  :D
    --
    Michael
    
  8. Re: Fix locking issue with fixed-size stats template in injection_points

    Michael Paquier <michael@paquier.xyz> — 2025-09-30T00:08:32Z

    On Mon, Sep 29, 2025 at 05:01:37PM +0900, Michael Paquier wrote:
    > It's that or popopop.  "glop" and "pas-glop" were my other candidates,
    > but I doubt we can use them freely, and you may be the only one around
    > here to know what this refers to.  :D
    
    Putting that aside, fixed down to v18.
    --
    Michael