Thread
-
Re: Add RISC-V Zbb popcount optimization
Greg Burd <greg@burd.me> — 2026-05-29T15:44:08Z
Hi, Thanks for taking a look at this. On 2026-05-28 14:17:16 +0000, Andres Freund wrote: > How confident are we that this bug just affects DES? It doesn't. I scanned master for the scatter-write idiom that des_init() uses (`dst[idx[i]] = expr`) and then compiled each candidate on greenfly with both gcc-13 and clang-20 at -O2 with -march=rv64gcv, diffing the disassembly for indexed-store instructions. The static scan turned up ~30 sites; clang-20 actually emits RVV scatter/strided stores in three of them, where gcc-13 emits scalar code: contrib/pgcrypto/crypt-des.c des_init 3x vsoxei8 (the known case) src/timezone/zic.c ~L2330 omittype[] 2x vsoxei8 (byte dest, same shape) contrib/pg_trgm/trgm_op.c compactTrgm 3x vsse8 (strided, not indexed) The zic.c hit is the one that bothers me. It's the same shape as des_init() -- byte-sized destination, scatter store via an indexing array -- and it's not exercised by `make check`, since zic runs at `make install` to compile the IANA tz data. If clang-20 miscompiles zic the same way it miscompiles des_init(), the buildfarm wouldn't notice and the resulting tzdata would silently be wrong. I haven't yet built and run that path on greenfly with clang-20 to confirm the miscompile actually triggers there; I'll do that next. The trgm_op.c hit is structurally different (strided store, not indexed scatter), so it isn't the #176001 pattern. I'm flagging it because it's in the same family of clang-20 RVV transformations on byte destinations, not because I have a known miscompile for it. Several other sites that match the source pattern (ginlogic.c MAYBE- twiddle, spi.c attribute scatter, spgtextproc.c, ...) survive clang-20 codegen unvectorized -- the autovec heuristics don't fire on the variable-length, control-flow-dependent loops where this idiom tends to live in our tree. So the *visible* exposure today appears bounded to crypt-des.c and zic.c, but I would not bet money that the next clang release won't extend the autovec to those. One more data point: the scatter stores only appear at -O2 and -O3, not -O1. Our default is -O2. > Have you confirmed that, by using a newer clang, the merging of the > fixes actually fixes the problem? Not yet. apt.llvm.org doesn't ship riscv64 packages, so I'm building clang from llvm-project main on greenfly now (currently mid-build, ~24h on this CPU under nice). Once it's installed I'll rerun the asm audit and the test suite under it; if HEAD generates correct code for crypt-des.c and zic.c that's a real data point for "reject affected clang" and gives us a version range for the cutoff. > ISTM a perfectly viable patch would be to just reject building with a > non-very-recent clang on riscv. It's the cleanest answer if HEAD is in fact fixed, and I'd lean that way. The cost is excluding the clang that ships in current Debian/ Ubuntu stable on riscv64 (clang 20 in noble / trixie), which is what most riscv64 users actually have today. I'd rather have the clang-HEAD bisect data before picking the cutoff -- "reject < clang-N" is much more defensible with N pinned to a specific fix. > That seems like a pretty odd fix for the problem. If the problem is > auto-vectorization, we should stop auto-vectorization, not sprinkle > memory barriers around. Agreed. pg_memory_barrier() was the first thing that made the test pass and I shipped it without going back to the right primitive. The audit makes per-loop pragmas look worse, not better: the affected sites aren't all in one file, and one of them (zic.c) is third-party- ish code we mostly don't touch. In rough order of locality: 1. #pragma clang loop vectorize(disable) on the four affected loops in des_init() (and the equivalent in zic.c, if confirmed). Doesn't scale if the next clang release vectorizes a fifth site. 2. A pg_attribute_no_vectorize on des_init() / the affected zic function. Coarser, defensive against new scatter writes inside the same function. 3. Configure-time -fno-tree-vectorize on crypt-des.c (and zic.c) for clang on riscv. File scope. Simple and static. 4. Configure-time -fno-tree-vectorize globally for clang+riscv64 until clang-N, where N is the bisected fix. Or refuse < N. 5. Hard error in configure for clang versions in the affected range. After the audit my preference shifted from (1) to (3) or (4): the bug isn't DES-specific, the patch shouldn't be either, and decorating loops one at a time as we trip over them is exactly the "sprinkle barriers around" complaint applied a level up. (4) would also fix the secondary intermittent issue we've seen on greenfly that I haven't been able to attribute to any specific loop. I'll send v5 once I have: - clang-HEAD on greenfly and the rerun audit/test results - confirmation (or refutation) that the zic.c miscompile actually triggers, not just that the scatter store is emitted If both confirm what the asm audit suggests, v5 will drop the pg_memory_barrier() approach in 0001 and replace it with either (3) or (4)+(5), depending on whether clang-HEAD passes. The other two patches do improve CRC32 and popcount on RISC-V, they are still worth considering. best. -greg