Thread
-
Re: greenfly lwlock corruption in REL_14_STABLE and REL_15_STABLE
Greg Burd <greg@burd.me> — 2025-12-11T17:27:37Z
On Wed, Dec 10, 2025, at 12:10 AM, Thomas Munro wrote: > Beginning a week ago, greenfly (RISC-V, Clang 20.1) has failed like > this in 5 of 8 runs of the pgbench tests on the two oldest branches: Hey Thomas, raising this. I should more closely monitor my farm animals. As greenfly is one of them and my login name is littered in the logs (gburd) I suppose I should dive into this. > TRAP: FailedAssertion("!(oldstate & LW_VAL_EXCLUSIVE)", File: > "lwlock.c", Line: 1850, PID: 1536294) > postgres: main: gburd postgres [local] CREATE > TYPE(ExceptionalCondition+0x72)[0x2ad1326922] > postgres: main: gburd postgres [local] CREATE > TYPE(LWLockRelease+0x51e)[0x2ad1634e60] > postgres: main: gburd postgres [local] CREATE > TYPE(_bt_first+0x7f8)[0x2ad139c314] > postgres: main: gburd postgres [local] CREATE > TYPE(btgettuple+0xca)[0x2ad13996f8] > postgres: main: gburd postgres [local] CREATE > TYPE(index_getnext_tid+0x2a)[0x2ad138bd66] > postgres: main: gburd postgres [local] CREATE > TYPE(index_getnext_slot+0x24)[0x2ad138bf56] > postgres: main: gburd postgres [local] CREATE > TYPE(systable_getnext+0x18)[0x2ad138a97c] > postgres: main: gburd postgres [local] CREATE > TYPE(GetNewOidWithIndex+0xfc)[0x2ad13ed284] > postgres: main: gburd postgres [local] CREATE > TYPE(EnumValuesCreate+0x58)[0x2ad14090ec] > postgres: main: gburd postgres [local] CREATE > TYPE(DefineEnum+0x10a)[0x2ad14bb948] > postgres: main: gburd postgres [local] CREATE TYPE(+0x3f0336)[0x2ad164a336] > postgres: main: gburd postgres [local] CREATE > TYPE(standard_ProcessUtility+0x468)[0x2ad1649560] > postgres: main: gburd postgres [local] CREATE TYPE(+0x3eec0e)[0x2ad1648c0e] > postgres: main: gburd postgres [local] CREATE TYPE(+0x3ee418)[0x2ad1648418] > postgres: main: gburd postgres [local] CREATE > TYPE(PortalRun+0x160)[0x2ad1647ec8] > postgres: main: gburd postgres [local] CREATE > TYPE(PostgresMain+0x1b34)[0x2ad1646000] > postgres: main: gburd postgres [local] CREATE TYPE(+0x36205a)[0x2ad15bc05a] > postgres: main: gburd postgres [local] CREATE > TYPE(ClosePostmasterPorts+0x0)[0x2ad15bb8e0] > postgres: main: gburd postgres [local] CREATE > TYPE(PostmasterMain+0x100a)[0x2ad15b92ac] > postgres: main: gburd postgres [local] CREATE TYPE(+0x2cac90)[0x2ad1524c90] > /lib/riscv64-linux-gnu/libc.so.6(+0x277cc)[0x3f9caa77cc] > /lib/riscv64-linux-gnu/libc.so.6(__libc_start_main+0x78)[0x3f9caa7878] > postgres: main: gburd postgres [local] CREATE TYPE(_start+0x20)[0x2ad1326ac0] > > That's: > > if (mode == LW_EXCLUSIVE) > oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_EXCLUSIVE); > else > oldstate = pg_atomic_sub_fetch_u32(&lock->state, LW_VAL_SHARED); > > /* nobody else can have that kind of lock */ > Assert(!(oldstate & LW_VAL_EXCLUSIVE)); > > I will see if I can reproduce it or see something wrong under qemu, > but that'll take some time to set up... It'll take me far less time to reproduce than you. :) > Since the RISC-V GCC animals aren't showing any problem, I wondered if > this could be related to commits d8ba910b, 1c7cba4, but that was ~30 > days ago, applied to all branches and prevented reordering of > non-atomic loads, while here I assume we have __sync_fetch_and_sub() > without a connection to other memory as far as I can see immediately. > Commits 332693e7, da39714 touched lwlock.c ~15 days ago, but not in a > way that immediately seems relevant; if there were a relevant flag > protocol difference in these branches, then why only this system? It > also passed half a dozen times before the cluster of failures. That > seems to point back towards codegen problems, but perhaps of a > different kind. Unless something else is going really wrong, but it's > hard to imagine that we forgot which lock type we held... > > date | branch | commit | assert_failed > ------------+---------------+---------------------------------+--------------- > 2025-12-09 | REL_15_STABLE | f188bc5 doc: Fix statement a... | > 2025-12-09 | REL_14_STABLE | 4c4fa53 doc: Fix statement a... | t > 2025-12-09 | REL_15_STABLE | 52a9588 Doc: fix typo in has... | t > 2025-12-05 | REL_15_STABLE | b9a02b9 Fix setting next mul... | > 2025-12-05 | REL_14_STABLE | 4896955 Fix setting next mul... | > 2025-12-05 | REL_15_STABLE | 7e54eac Show version of node... | t > 2025-12-03 | REL_15_STABLE | 8cfb174 Set next multixid's ... | t > 2025-12-03 | REL_14_STABLE | 81416e1 Set next multixid's ... | t > 2025-12-02 | REL_15_STABLE | 7792bdc Fix amcheck's handli... | > 2025-12-02 | REL_14_STABLE | fbb4b60 Fix amcheck's handli... | > 2025-11-29 | REL_15_STABLE | 134a8ee Avoid rewriting data... | > 2025-11-29 | REL_14_STABLE | 2d5b97b Avoid rewriting data... | > 2025-11-27 | REL_15_STABLE | f19502f Allow indexscans on ... | > 2025-11-27 | REL_14_STABLE | 9e77323 Allow indexscans on ... | > 2025-11-27 | REL_15_STABLE | f9f9283 doc: Fix misleading ... | > 2025-11-26 | REL_15_STABLE | eb7743e doc: Clarify passphr... | > 2025-11-26 | REL_14_STABLE | 9a26ff8 doc: Clarify passphr... | > 2025-11-25 | REL_15_STABLE | da39714 lwlock: Fix, current... | > 2025-11-25 | REL_14_STABLE | 332693e lwlock: Fix, current... | > 2025-11-24 | REL_15_STABLE | ea757e8 Fix incorrect IndexO... | > 2025-11-24 | REL_14_STABLE | ea36c2f Fix incorrect IndexO... | > 2025-11-22 | REL_15_STABLE | 5516485 jit: Adjust AArch64-... | > 2025-11-22 | REL_14_STABLE | 035a1f5 jit: Adjust AArch64-... | > 2025-11-19 | REL_15_STABLE | 7c49407 Print new OldestXID ... | > 2025-11-19 | REL_14_STABLE | 11cc0f4 Print new OldestXID ... | > 2025-11-18 | REL_15_STABLE | 9f5a58a Don't allow CTEs to ... | > 2025-11-18 | REL_14_STABLE | b853974 Don't allow CTEs to ... | > 2025-11-18 | REL_15_STABLE | 3995e4a Define PS_USE_CLOBBE... | > 2025-11-18 | REL_14_STABLE | 29a3e22 Define PS_USE_CLOBBE... | > 2025-11-17 | REL_15_STABLE | ad5cc3a Update .abi-complian... | > 2025-11-16 | REL_15_STABLE | 5d5b05c Doc: include MERGE i... | > 2025-11-14 | REL_15_STABLE | d61af52 Add note about Creat... | > 2025-11-14 | REL_14_STABLE | 4c179cc Add note about Creat... | > 2025-11-13 | REL_15_STABLE | c663152 doc: Improve descrip... | > 2025-11-13 | REL_14_STABLE | 7aa83ea doc: Improve descrip... | > 2025-11-12 | REL_15_STABLE | 21a9014 Clear 'xid' in dummy... | > 2025-11-12 | REL_14_STABLE | 84f1bf4 Clear 'xid' in dummy... | > 2025-11-12 | REL_14_STABLE | 4ef048f doc: Document effect... | > 2025-11-12 | REL_15_STABLE | 608566b doc: Document effect... | > 2025-11-12 | REL_14_STABLE | f8a0ea8 Fix range for commit... | > 2025-11-12 | REL_15_STABLE | 97cd4b6 Fix pg_upgrade aroun... | > 2025-11-12 | REL_15_STABLE | 74b26c8 doc: Fix incorrect s... | > 2025-11-11 | REL_15_STABLE | 32f3881 Stamp 15.15.... | > 2025-11-11 | REL_14_STABLE | 9ad034b Stamp 14.20.... | > 2025-11-10 | REL_15_STABLE | 70d03b5 Last-minute updates ... | > 2025-11-10 | REL_14_STABLE | ee953cd Last-minute updates ... | > 2025-11-10 | REL_15_STABLE | 9142156 libpq: Prevent some ... | > 2025-11-10 | REL_14_STABLE | e792be6 Translation updates... | > 2025-11-09 | REL_15_STABLE | e334e80 Release notes for 18... | > 2025-11-09 | REL_14_STABLE | 06827c5 Release notes for 18... | > 2025-11-08 | REL_15_STABLE | 1c7cba4 Fix generic read and... | > 2025-11-08 | REL_14_STABLE | d8ba910 Fix generic read and... | I'll see what I can do to find the offending commit(s). best. -greg