Thread
-
Failure in test_slru for host gokiburi (REL_16_STABLE only)
Michael Paquier <michael@paquier.xyz> — 2026-05-18T11:41:45Z
Hi all, gokiburi has been failing on only REL_16_STABLE for the last few days, for the tests of module test_slru. First failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gokiburi&dt=2026-05-13%2012%3A20%3A45 Set of changes associated with the first failure, which seem completely innocent to me: 5f12d86dd76 Wed May 13 05:43:49 2026 UTC Add more tests for corrupted data with pglz_decompress() d140237dab8 Wed May 13 02:46:17 2026 UTC Fix stale COPY progress during logical replication table sync While the buildfarm runs don't show much, I have been able to reproduce the failure on the buildfarm host, after using -DEXEC_BACKEND. Here is a backtrace, pointing out that something is broken with LWLock initialization: 2026-05-18 05:20:50.186 UTC client backend[870830] pg_regress/test_slru STATEMENT: SELECT test_slru_page_readonly(12377); TRAP: failed Assert("LWLockHeldByMe(TestSLRULock)"), File: "test_slru.c", Line: 124, PID: 870830 postgres: popo contrib_regression [local] SELECT(ExceptionalCondition+0x16c) [0xaaaaabcf4d88] /home/popo/lib/test_slru.so(test_slru_page_readonly+0xe4) [0xffffedf83060] postgres: popo contrib_regression [local] SELECT(+0x885c40) [0xaaaaab325c40] postgres: popo contrib_regression [local] SELECT(ExecInterpExprStillValid+0x84) [0xaaaaab329a4c] postgres: popo contrib_regression [local] SELECT(+0x9405fc) [0xaaaaab3e05fc] postgres: popo contrib_regression [local] SELECT(+0x9406d4) [0xaaaaab3e06d4] postgres: popo contrib_regression [local] SELECT(+0x940b34) [0xaaaaab3e0b34] postgres: popo contrib_regression [local] SELECT(+0x8b7ac0) [0xaaaaab357ac0] postgres: popo contrib_regression [local] SELECT(+0x89de14) [0xaaaaab33de14] postgres: popo contrib_regression [local] SELECT(+0x8a46c0) [0xaaaaab3446c0] postgres: popo contrib_regression [local] SELECT(standard_ExecutorRun+0x2d0) [0xaaaaab33ec68] postgres: popo contrib_regression [local] SELECT(ExecutorRun+0xb8) [0xaaaaab33e970] postgres: popo contrib_regression [local] SELECT(+0xe550dc) [0xaaaaab8f50dc] postgres: popo contrib_regression [local] SELECT(PortalRun+0x460) [0xaaaaab8f4958] postgres: popo contrib_regression [local] SELECT(+0xe43150) [0xaaaaab8e3150] postgres: popo contrib_regression [local] SELECT(PostgresMain+0x15e8) [0xaaaaab8f0560] postgres: popo contrib_regression [local] SELECT(postmaster_forkexec+0x0) [0xaaaaab70f644] postgres: popo contrib_regression [local] SELECT(SubPostmasterMain+0x6fc) [0xaaaaab7106d8] postgres: popo contrib_regression [local] SELECT(main+0x6d0) [0xaaaaab463f6c] /lib/aarch64-linux-gnu/libc.so.6(+0x2225c) [0xfffff725225c] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x9c) [0xfffff725233c] postgres: popo contrib_regression [local] SELECT(_start+0x30) [0xaaaaaad3d4b0] The server logs include the following, pointing to a broken state (these two should not fail): 2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru ERROR: lock <unassigned:0> is not held 2026-05-18 05:20:50.184 UTC client backend[870830] pg_regress/test_slru STATEMENT: SELECT test_slru_page_write(12345, 'Test SLRU'); Note that the tests pass without -DEXEC_BACKEND. While reading through the module, I think that the LWLock initialization logic is borked, where we decide to do a LWLockInitialize() more times than necessary, confusing the internal states. Honestly, I have no clue why the test has suddenly been failing, and why other buildfarm members don't complain. The host has been upgraded a couple of days ago to the latest Debian, but I also had a few clean runs in the buildfarm before this began showing up. What I do know is that the patch attached is able to make the tests of the module pass for v16 on the problematic host with -DEXEC_BACKEND. Comments or opinions? -- Michael