Thread

  1. Re: Suggestion to add --continue-client-on-abort option to pgbench

    Yugo Nagata <nagata@sraoss.co.jp> — 2025-11-13T05:50:33Z

    On Thu, 13 Nov 2025 13:14:37 +0800
    Chao Li <li.evan.chao@gmail.com> wrote:
    
    > 
    > 
    > > On Nov 13, 2025, at 12:02, Chao Li <li.evan.chao@gmail.com> wrote:
    > > 
    > > 
    > > 
    > >> On Nov 13, 2025, at 11:47, Fujii Masao <masao.fujii@gmail.com> wrote:
    > >> 
    > >> On Thu, Nov 13, 2025 at 11:21 AM Chao Li <li.evan.chao@gmail.com> wrote:
    > >>> I debugged further this morning, and I think I have found the root cause. Ultimately, the problem is not with discardUntilSync(), instead, discardAvailableResults() mistakenly eats PGRES_PIPELINE_SYNC.
    > >> 
    > >> Thanks for debugging!
    > >> 
    > >> Yes, discardAvailableResults() can discard PGRES_PIPELINE_SYNC,
    > >> but do you mean that's the root cause of the assertion failure
    > >> Nagata-san reported?
    > >> Since that failure can occur even in older branches, I was thinking
    > >> that newer code
    > >> like discardAvailableResults() in master isn't the root cause...
    > >> 
    > > 
    > > I haven’t debugged with old code, but the old code also discard non-NULL results:
    > > 
    > > ```
    > > - do
    > > - {
    > > - res = PQgetResult(st->con);
    > > - PQclear(res);
    > > - } while (res);
    > > + discardAvailableResults(st);
    > > ```
    > > 
    > > Which may also discard the sync message. That’s my guess. I can also debug the old code this afternoon.
    > > 
    > 
    > I just tried the old code but it didn’t trigger the assert with Yugo’s deadlock scripts.
    
    To trigger a deadlock error, the tables need to have enough rows so that the scan takes some
    time. In my environment, about 1,000 rows were enough to cause a deadlock.
    
    Regards,
    Yugo Nagata
    
    > 
    > I did "git reset --hard a3ea5330fcf47390c8ab420bbf433a97a54505d6”, that is the previous commit of “—continue-on-error”. And I ran Yugo’s deadlock scripts, but I didn’t get the assert:
    > 
    > ```
    > % pgbench -n  --failures-detailed  -M extended -j 2 -c 2  -f deadlock.sql -f deadlock2.sql evantest
    > pgbench (19devel)
    > transaction type: multiple scripts
    > scaling factor: 1
    > query mode: extended
    > number of clients: 2
    > number of threads: 2
    > maximum number of tries: 1
    > number of transactions per client: 10
    > number of transactions actually processed: 20/20
    > number of failed transactions: 0 (0.000%)
    > number of serialization failures: 0 (0.000%)
    > number of deadlock failures: 0 (0.000%)
    > latency average = 0.341 ms
    > initial connection time = 2.637 ms
    > tps = 5865.102639 (without initial connection time)
    > SQL script 1: deadlock.sql
    >  - weight: 1 (targets 50.0% of total)
    >  - 12 transactions (60.0% of total)
    >  - number of transactions actually processed: 12 (tps = 3519.061584)
    >  - number of failed transactions: 0 (0.000%)
    >  - number of serialization failures: 0 (0.000%)
    >  - number of deadlock failures: 0 (0.000%)
    >  - latency average = 0.311 ms
    >  - latency stddev = 0.304 ms
    > SQL script 2: deadlock2.sql
    >  - weight: 1 (targets 50.0% of total)
    >  - 8 transactions (40.0% of total)
    >  - number of transactions actually processed: 8 (tps = 2346.041056)
    >  - number of failed transactions: 0 (0.000%)
    >  - number of serialization failures: 0 (0.000%)
    >  - number of deadlock failures: 0 (0.000%)
    >  - latency average = 0.366 ms
    >  - latency stddev = 0.364 ms
    > ```
    > 
    > Best regards,
    > --
    > Chao Li (Evan)
    > HighGo Software Co., Ltd.
    > https://www.highgo.com/
    > 
    > 
    > 
    > 
    
    
    -- 
    Yugo Nagata <nagata@sraoss.co.jp>