Thread

  1. Re: greenfly lwlock corruption in REL_14_STABLE and REL_15_STABLE

    Greg Burd <greg@burd.me> — 2025-12-11T17:27:37Z

    On Wed, Dec 10, 2025, at 12:10 AM, Thomas Munro wrote:
    > Beginning a week ago, greenfly (RISC-V, Clang 20.1) has failed like
    > this in 5 of 8 runs of the pgbench tests on the two oldest branches:
    
    Hey Thomas, raising this.  I should more closely monitor my farm animals.  As greenfly is one of them and my login name is littered in the logs (gburd) I suppose I should dive into this.
    
    > TRAP: FailedAssertion("!(oldstate & LW_VAL_EXCLUSIVE)", File:
    > "lwlock.c", Line: 1850, PID: 1536294)
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(ExceptionalCondition+0x72)[0x2ad1326922]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(LWLockRelease+0x51e)[0x2ad1634e60]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(_bt_first+0x7f8)[0x2ad139c314]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(btgettuple+0xca)[0x2ad13996f8]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(index_getnext_tid+0x2a)[0x2ad138bd66]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(index_getnext_slot+0x24)[0x2ad138bf56]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(systable_getnext+0x18)[0x2ad138a97c]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(GetNewOidWithIndex+0xfc)[0x2ad13ed284]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(EnumValuesCreate+0x58)[0x2ad14090ec]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(DefineEnum+0x10a)[0x2ad14bb948]
    > postgres: main: gburd postgres [local] CREATE TYPE(+0x3f0336)[0x2ad164a336]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(standard_ProcessUtility+0x468)[0x2ad1649560]
    > postgres: main: gburd postgres [local] CREATE TYPE(+0x3eec0e)[0x2ad1648c0e]
    > postgres: main: gburd postgres [local] CREATE TYPE(+0x3ee418)[0x2ad1648418]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(PortalRun+0x160)[0x2ad1647ec8]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(PostgresMain+0x1b34)[0x2ad1646000]
    > postgres: main: gburd postgres [local] CREATE TYPE(+0x36205a)[0x2ad15bc05a]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(ClosePostmasterPorts+0x0)[0x2ad15bb8e0]
    > postgres: main: gburd postgres [local] CREATE
    > TYPE(PostmasterMain+0x100a)[0x2ad15b92ac]
    > postgres: main: gburd postgres [local] CREATE TYPE(+0x2cac90)[0x2ad1524c90]
    > /lib/riscv64-linux-gnu/libc.so.6(+0x277cc)[0x3f9caa77cc]
    > /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x78)[0x3f9caa7878]
    > postgres: main: gburd postgres [local] CREATE TYPE(_start+0x20)[0x2ad1326ac0]
    >
    > That's:
    >
    >     if (mode == LW_EXCLUSIVE)
    >         oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_EXCLUSIVE);
    >     else
    >         oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_SHARED);
    >
    >     /* nobody else can have that kind of lock */
    >     Assert(!(oldstate & LW_VAL_EXCLUSIVE));
    >
    > I will see if I can reproduce it or see something wrong under qemu,
    > but that'll take some time to set up...
    
    It'll take me far less time to reproduce than you. :)
    
    > Since the RISC-V GCC animals aren't showing any problem, I wondered if
    > this could be related to commits d8ba910b, 1c7cba4, but that was ~30
    > days ago, applied to all branches and prevented reordering of
    > non-atomic loads, while here I assume we have __sync_fetch_and_sub()
    > without a connection to other memory as far as I can see immediately.
    > Commits 332693e7, da39714 touched lwlock.c ~15 days ago, but not in a
    > way that immediately seems relevant; if there were a relevant flag
    > protocol difference in these branches, then why only this system?  It
    > also passed half a dozen times before the cluster of failures.  That
    > seems to point back towards codegen problems, but perhaps of a
    > different kind.  Unless something else is going really wrong, but it's
    > hard to imagine that we forgot which lock type we held...
    >
    >     date    |    branch     |             commit              | assert_failed
    > ------------+---------------+---------------------------------+---------------
    >  2025-12-09 | REL_15_STABLE | f188bc5 doc: Fix statement a... |
    >  2025-12-09 | REL_14_STABLE | 4c4fa53 doc: Fix statement a... | t
    >  2025-12-09 | REL_15_STABLE | 52a9588 Doc: fix typo in has... | t
    >  2025-12-05 | REL_15_STABLE | b9a02b9 Fix setting next mul... |
    >  2025-12-05 | REL_14_STABLE | 4896955 Fix setting next mul... |
    >  2025-12-05 | REL_15_STABLE | 7e54eac Show version of node... | t
    >  2025-12-03 | REL_15_STABLE | 8cfb174 Set next multixid's ... | t
    >  2025-12-03 | REL_14_STABLE | 81416e1 Set next multixid's ... | t
    >  2025-12-02 | REL_15_STABLE | 7792bdc Fix amcheck's handli... |
    >  2025-12-02 | REL_14_STABLE | fbb4b60 Fix amcheck's handli... |
    >  2025-11-29 | REL_15_STABLE | 134a8ee Avoid rewriting data... |
    >  2025-11-29 | REL_14_STABLE | 2d5b97b Avoid rewriting data... |
    >  2025-11-27 | REL_15_STABLE | f19502f Allow indexscans on ... |
    >  2025-11-27 | REL_14_STABLE | 9e77323 Allow indexscans on ... |
    >  2025-11-27 | REL_15_STABLE | f9f9283 doc: Fix misleading ... |
    >  2025-11-26 | REL_15_STABLE | eb7743e doc: Clarify passphr... |
    >  2025-11-26 | REL_14_STABLE | 9a26ff8 doc: Clarify passphr... |
    >  2025-11-25 | REL_15_STABLE | da39714 lwlock: Fix, current... |
    >  2025-11-25 | REL_14_STABLE | 332693e lwlock: Fix, current... |
    >  2025-11-24 | REL_15_STABLE | ea757e8 Fix incorrect IndexO... |
    >  2025-11-24 | REL_14_STABLE | ea36c2f Fix incorrect IndexO... |
    >  2025-11-22 | REL_15_STABLE | 5516485 jit: Adjust AArch64-... |
    >  2025-11-22 | REL_14_STABLE | 035a1f5 jit: Adjust AArch64-... |
    >  2025-11-19 | REL_15_STABLE | 7c49407 Print new OldestXID ... |
    >  2025-11-19 | REL_14_STABLE | 11cc0f4 Print new OldestXID ... |
    >  2025-11-18 | REL_15_STABLE | 9f5a58a Don't allow CTEs to ... |
    >  2025-11-18 | REL_14_STABLE | b853974 Don't allow CTEs to ... |
    >  2025-11-18 | REL_15_STABLE | 3995e4a Define PS_USE_CLOBBE... |
    >  2025-11-18 | REL_14_STABLE | 29a3e22 Define PS_USE_CLOBBE... |
    >  2025-11-17 | REL_15_STABLE | ad5cc3a Update .abi-complian... |
    >  2025-11-16 | REL_15_STABLE | 5d5b05c Doc: include MERGE i... |
    >  2025-11-14 | REL_15_STABLE | d61af52 Add note about Creat... |
    >  2025-11-14 | REL_14_STABLE | 4c179cc Add note about Creat... |
    >  2025-11-13 | REL_15_STABLE | c663152 doc: Improve descrip... |
    >  2025-11-13 | REL_14_STABLE | 7aa83ea doc: Improve descrip... |
    >  2025-11-12 | REL_15_STABLE | 21a9014 Clear 'xid' in dummy... |
    >  2025-11-12 | REL_14_STABLE | 84f1bf4 Clear 'xid' in dummy... |
    >  2025-11-12 | REL_14_STABLE | 4ef048f doc: Document effect... |
    >  2025-11-12 | REL_15_STABLE | 608566b doc: Document effect... |
    >  2025-11-12 | REL_14_STABLE | f8a0ea8 Fix range for commit... |
    >  2025-11-12 | REL_15_STABLE | 97cd4b6 Fix pg_upgrade aroun... |
    >  2025-11-12 | REL_15_STABLE | 74b26c8 doc: Fix incorrect s... |
    >  2025-11-11 | REL_15_STABLE | 32f3881 Stamp 15.15....         |
    >  2025-11-11 | REL_14_STABLE | 9ad034b Stamp 14.20....         |
    >  2025-11-10 | REL_15_STABLE | 70d03b5 Last-minute updates ... |
    >  2025-11-10 | REL_14_STABLE | ee953cd Last-minute updates ... |
    >  2025-11-10 | REL_15_STABLE | 9142156 libpq: Prevent some ... |
    >  2025-11-10 | REL_14_STABLE | e792be6 Translation updates...  |
    >  2025-11-09 | REL_15_STABLE | e334e80 Release notes for 18... |
    >  2025-11-09 | REL_14_STABLE | 06827c5 Release notes for 18... |
    >  2025-11-08 | REL_15_STABLE | 1c7cba4 Fix generic read and... |
    >  2025-11-08 | REL_14_STABLE | d8ba910 Fix generic read and... |
    
    I'll see what I can do to find the offending commit(s).
    
    best.
    
    -greg