Thread
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Disable parallel plans for RIGHT_SEMI joins
- ef6168bafe9b 18.1 landed
- 257ee78341f2 19 (unreleased) landed
-
BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
PG Bug reporting form <noreply@postgresql.org> — 2025-10-24T14:23:34Z
The following bug has been logged on the website: Bug reference: 19094 Logged by: Lori Corbani Email address: lori.corbani@jax.org PostgreSQL version: 18.0 Operating system: Rocky Linux 9.6 (Blue Onyx) Description: postgres 17: (A) SELECT 115457, (B) SELECT 115457 : same counts postgres 18: (A) SELECT 115444, (B) SELECT 115436 : different counts I am running this from /usr/pgsql-18/bin/psql step 1: psql -h ${PG_DBSERVER} -U ${PG_DBUSER} -d ${PG_DBNAME} -e step 2: run (A) version / without the "distinct" clause step 3: select results step 4: exit psql step 5. repeat step 1-4 each time I run "pgsql" and the same SQL, I get a different count. sometimes the count goes up and sometimes the count goes down. running the SQL with the "distinct" fixes the problem, but we are not having this problem on postgres 17. Postgres 17 consistently returns the same count. When we migrated from Postgres 17 to Postgres 18, I used the same "postgressql.conf" and "pg_hpa.config". Also, we migrated from Postgres 15 to Postgres 17 earlier this year with no issues; using the same method. If I remove the "exists" statement, then the counts are fine. So, it seems that it is the "exists" statement that is causing the issue. "select s._Strain_key" VS "select distinct s._Strain_key" from prb_strain s where s.private = 0 and s.strain not ilike '%involves%' and s.strain not ilike '%either%' and s.strain not ilike '% and %' and s.strain not ilike '% or %' and exists (select 1 from voc_annot va, voc_term t where va._AnnotType_key = 1009 and va._Term_key = t._Term_key and t.term != 'Not Applicable' and t.term != 'Not Specified' and va._Object_key = s._Strain_key) -
Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-25T14:31:05Z
PG Bug reporting form <noreply@postgresql.org> writes: > If I remove the "exists" statement, then the counts are fine. > So, it seems that it is the "exists" statement that is causing the issue. > "select s._Strain_key" VS "select distinct s._Strain_key" > from prb_strain s > where s.private = 0 > and s.strain not ilike '%involves%' > and s.strain not ilike '%either%' > and s.strain not ilike '% and %' > and s.strain not ilike '% or %' > and exists (select 1 from voc_annot va, voc_term t > where va._AnnotType_key = 1009 > and va._Term_key = t._Term_key > and t.term != 'Not Applicable' > and t.term != 'Not Specified' > and va._Object_key = s._Strain_key) This report is inadequate to help us identify the issue. You've not provided the relevant table declarations, nor any sample data that would reproduce the problem. Given the squishiness of the described behavior, I realize that building a self-contained reproducer might be hard. In the meantime, could you at least provide EXPLAIN ANALYZE results from correct and incorrect executions? regards, tom lane
-
RE: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Lori Corbani <lori.corbani@jax.org> — 2025-10-27T10:21:17Z
Tom, Sorry, I haven't had to enter a PG ticket before. I realize that there are many things I didn't provide. My apologies. I was looking for a link back to my bug report so I could attach more information but didn't see one in the email that came back. I am working on getting the file together with the appropriate tables & data this morning. In the meaintime, I did run the EXPLAIN ANALYZE after I sent out the initial email. See below. Postgres 17 EXPLAIN ANALYZE Merge Semi Join (cost=1.46..65223.16 rows=89672 width=4) (actual time=1.315..367.474 rows=115457 loops=1) Merge Cond: (s._strain_key = va._object_key) -> Index Scan using prb_strain_pkey on prb_strain s (cost=0.42..6486.88 rows=115741 width=4) (actual time=0.013..99.843 rows=117065 loops=1) Filter: ((strain !* '%involves%'::text) AND (strain !* '%either%'::text) AND (strain !* '% and %'::text) AND (strain !* '% or %'::text) AND (private = 0)) Rows Removed by Filter: 16340 -> Nested Loop (cost=0.86..118573.26 rows=322352 width=4) (actual time=1.284..232.161 rows=321574 loops=1) -> Index Only Scan using voc_annot_idx_clustered on voc_annot va (cost=0.43..95392.15 rows=322480 width=8) (actual time=0.054..139.127 rows=322488 loops=1) Index Cond: (_annottype_key = 1009) Heap Fetches: 1668 -> Memoize (cost=0.43..0.51 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=322488) Cache Key: va._term_key Cache Mode: logical Hits: 322441 Misses: 47 Evictions: 0 Overflows: 0 Memory Usage: 5kB -> Index Scan using voc_term_pkey on voc_term t (cost=0.42..0.50 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=47) Index Cond: (_term_key = va._term_key) Filter: ((term <> 'Not Applicable'::text) AND (term <> 'Not Specified'::text)) Rows Removed by Filter: 0 Planning Time: 4.688 ms Execution Time: 371.618 ms (19 rows) Postgres 18 EXPLAIN ANALYZE Gather (cost=17394.54..60888.14 rows=88631 width=4) (actual time=112.329..193.108 rows=115447.0 0 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=13901 -> Parallel Hash Right Semi Join (cost=16394.54..51025.04 rows=36930 width=4) (actual time=1 06.616..171.437 rows=38482.33 loops=3) Hash Cond: (va._object_key = s._strain_key) Buffers: shared hit=13901 -> Parallel Hash Join (cost=12089.56..45217.05 rows=134260 width=4) (actual time=67.78 1..106.574 rows=107171.33 loops=3) Hash Cond: (va._term_key = t._term_key) Buffers: shared hit=12109 -> Parallel Bitmap Heap Scan on voc_annot va (cost=3046.66..35821.57 rows=134313 width=8) (actual time=12.580..32.848 rows=107476.00 loops=3) Recheck Cond: (_annottype_key = 1009) Heap Blocks: exact=1802 Buffers: shared hit=6277 Worker 0: Heap Blocks: exact=2068 Worker 1: Heap Blocks: exact=2132 -> Bitmap Index Scan on voc_annot_idx_annottype_key (cost=0.00..2966.07 ro ws=322352 width=0) (actual time=11.476..11.477 rows=322428.00 loops=1) Index Cond: (_annottype_key = 1009) Index Searches: 1 Buffers: shared hit=275 -> Parallel Hash (cost=7583.72..7583.72 rows=116735 width=4) (actual time=53.855 ..53.856 rows=93389.33 loops=3) Buckets: 524288 Batches: 1 Memory Usage: 15072kB Buffers: shared hit=5832 -> Parallel Seq Scan on voc_term t (cost=0.00..7583.72 rows=116735 width=4 ) (actual time=0.022..24.986 rows=93389.33 loops=3) Filter: ((term <> 'Not Applicable'::text) AND (term <> 'Not Specified' ::text)) Rows Removed by Filter: 36 Buffers: shared hit=5832 -> Parallel Hash (cost=3458.96..3458.96 rows=67681 width=4) (actual time=38.276..38.27 7 rows=39019.33 loops=3) Buckets: 131072 Batches: 1 Memory Usage: 5632kB Buffers: shared hit=1792 -> Parallel Seq Scan on prb_strain s (cost=0.00..3458.96 rows=67681 width=4) (actual time=0.033..27.676 rows=39019.33 loops=3) Filter: ((strain !* '%involves%'::text) AND (strain !* '%either%'::text) AND (strain !* '% and %'::text) AND (strain !* '% or %'::text) AND (private = 0)) Rows Removed by Filter: 2963 Buffers: shared hit=1792 Planning: Buffers: shared hit=724 Planning Time: 5.633 ms Execution Time: 198.235 ms (38 rows) -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Saturday, October 25, 2025 10:31 AM To: Lori Corbani <Lori.Corbani@jax.org> Cc: pgsql-bugs@lists.postgresql.org Subject: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results PG Bug reporting form <noreply@postgresql.org> writes: > If I remove the "exists" statement, then the counts are fine. > So, it seems that it is the "exists" statement that is causing the issue. > "select s._Strain_key" VS "select distinct s._Strain_key" > from prb_strain s > where s.private = 0 > and s.strain not ilike '%involves%' > and s.strain not ilike '%either%' > and s.strain not ilike '% and %' > and s.strain not ilike '% or %' > and exists (select 1 from voc_annot va, voc_term t > where va._AnnotType_key = 1009 > and va._Term_key = t._Term_key > and t.term != 'Not Applicable' > and t.term != 'Not Specified' > and va._Object_key = s._Strain_key) This report is inadequate to help us identify the issue. You've not provided the relevant table declarations, nor any sample data that would reproduce the problem. Given the squishiness of the described behavior, I realize that building a self-contained reproducer might be hard. In the meantime, could you at least provide EXPLAIN ANALYZE results from correct and incorrect executions? regards, tom lane --- The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible. -
RE: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Lori Corbani <lori.corbani@jax.org> — 2025-10-27T11:34:30Z
Tom, Attached is a file with this info. Please let me know if this is what you need. PRB_Strain.bcp.gz PRB_Strain_create.object : schema VOC_Annot.bcp.gz VOC_Annot_create.object : schema VOC_Term.bcp.gz VOC_Term_create.object : schema Thanks. Lori -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Saturday, October 25, 2025 10:31 AM To: Lori Corbani <Lori.Corbani@jax.org> Cc: pgsql-bugs@lists.postgresql.org Subject: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results PG Bug reporting form <noreply@postgresql.org> writes: > If I remove the "exists" statement, then the counts are fine. > So, it seems that it is the "exists" statement that is causing the issue. > "select s._Strain_key" VS "select distinct s._Strain_key" > from prb_strain s > where s.private = 0 > and s.strain not ilike '%involves%' > and s.strain not ilike '%either%' > and s.strain not ilike '% and %' > and s.strain not ilike '% or %' > and exists (select 1 from voc_annot va, voc_term t > where va._AnnotType_key = 1009 > and va._Term_key = t._Term_key > and t.term != 'Not Applicable' > and t.term != 'Not Specified' > and va._Object_key = s._Strain_key) This report is inadequate to help us identify the issue. You've not provided the relevant table declarations, nor any sample data that would reproduce the problem. Given the squishiness of the described behavior, I realize that building a self-contained reproducer might be hard. In the meantime, could you at least provide EXPLAIN ANALYZE results from correct and incorrect executions? regards, tom lane --- The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
-
RE: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Lori Corbani <lori.corbani@jax.org> — 2025-10-27T16:32:16Z
Tom, 2 things I did this morning: 1. I added "order by" clause ; no change 2. I added "select distinct"; which fixed this, as I expected. However, when I googled the Postgres "exists" best practices, it seems to suggest that the "distinct" is unnecessary. From Google Search: Therefore, using DISTINCT within a subquery for an EXISTS clause is generally redundant and unnecessary. The EXISTS operator only cares if any row satisfies the subquery's condition, not how many or if they are unique. Adding DISTINCT would typically add overhead by requiring the database to sort and filter for uniqueness, which is not required for the EXISTS check itself. What is your suggestion for best practice when using "exists" clause? Many thanks. Lori -----Original Message----- From: Lori Corbani Sent: Monday, October 27, 2025 7:34 AM To: 'Tom Lane' <tgl@sss.pgh.pa.us> Cc: 'pgsql-bugs@lists.postgresql.org' <pgsql-bugs@lists.postgresql.org> Subject: RE: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results Tom, Attached is a file with this info. Please let me know if this is what you need. PRB_Strain.bcp.gz PRB_Strain_create.object : schema VOC_Annot.bcp.gz VOC_Annot_create.object : schema VOC_Term.bcp.gz VOC_Term_create.object : schema Thanks. Lori -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Saturday, October 25, 2025 10:31 AM To: Lori Corbani <Lori.Corbani@jax.org> Cc: pgsql-bugs@lists.postgresql.org Subject: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results PG Bug reporting form <noreply@postgresql.org> writes: > If I remove the "exists" statement, then the counts are fine. > So, it seems that it is the "exists" statement that is causing the issue. > "select s._Strain_key" VS "select distinct s._Strain_key" > from prb_strain s > where s.private = 0 > and s.strain not ilike '%involves%' > and s.strain not ilike '%either%' > and s.strain not ilike '% and %' > and s.strain not ilike '% or %' > and exists (select 1 from voc_annot va, voc_term t > where va._AnnotType_key = 1009 > and va._Term_key = t._Term_key > and t.term != 'Not Applicable' > and t.term != 'Not Specified' > and va._Object_key = s._Strain_key) This report is inadequate to help us identify the issue. You've not provided the relevant table declarations, nor any sample data that would reproduce the problem. Given the squishiness of the described behavior, I realize that building a self-contained reproducer might be hard. In the meantime, could you at least provide EXPLAIN ANALYZE results from correct and incorrect executions? regards, tom lane --- The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-27T19:25:03Z
Lori Corbani <Lori.Corbani@jax.org> writes: > Attached is a file with this info. Please let me know if this is what you need. Thank you for the test case. With your smaller data set (for some reason not the larger one) I can duplicate the misbehavior: v17 consistently returns 115427 rows, v18 often one or two more than that. I just loaded up the data, vacuum analyzed, and ran select count(*) from ( select s._Strain_key from mgd.prb_strain s where s.private = 0 and s.strain not ilike '%involves%' and s.strain not ilike '%either%' and s.strain not ilike '% and %' and s.strain not ilike '% or %' and exists (select 1 from mgd.voc_annot va, mgd.voc_term t where va._AnnotType_key = 1009 and va._Term_key = t._Term_key and t.term != 'Not Applicable' and t.term != 'Not Specified' and va._Object_key = s._Strain_key) ) ss; repeatedly in an out-of-the-box configuration. (BTW, without the outer "count(*)" subquery, it tends not to pick a PHJ plan, so the misbehavior isn't there. I didn't look into why not.) After a good deal of time bisecting, I find the first commit that shows the problem is Author: Richard Guo <rguo@postgresql.org> Branch: master Release: REL_18_BR [aa86129e1] 2024-07-05 09:26:48 +0900 Support "Right Semi Join" plan shapes which is perhaps unsurprising, because the planner picks a "Parallel Hash Right Semi Join" plan after that and a non-Right plan before it. So it looks like there is some race condition in that. The failure rate at aa86129e1 and following commits is only perhaps two or three tries out of 100, and I never saw more than 115428 rows produced. However, after Author: Thomas Munro <tmunro@postgresql.org> Branch: master Release: REL_18_BR [4effd0844] 2024-08-31 17:28:02 +1200 Branch: REL_17_STABLE Release: REL_17_0 [3ed368361] 2024-08-31 17:29:30 +1200 Fix unfairness in all-cached parallel seq scan. the failure rate gets enormously worse, and there are many runs producing more than one extra row, for example these stats from 200 test runs at 5668a857d: $ sort -n ~/corbanicounts | uniq -c 50 115427 78 115428 38 115429 19 115430 11 115431 4 115432 I don't know what to make of that, exactly. 4effd0844 is nowhere near the PHJ code, and it hasn't been causing any problems in the v17 branch as far as I've heard. My guess is that it changes the order of delivery of rows to PHJ in a way that tends to tickle the hypothesized race condition. BTW, 5668a857d did not move the needle one way or the other, so it's not that. None of this is code that I've been heavily into, so I hope somebody else will pick it up. But if we can't solve it in the next 10 days or so, I think we need to disable parallel right semi joins for 18.1. regards, tom lane -
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-27T20:12:56Z
I wrote: > After a good deal of time bisecting, I find the first commit that > shows the problem is > Author: Richard Guo <rguo@postgresql.org> > Branch: master Release: REL_18_BR [aa86129e1] 2024-07-05 09:26:48 +0900 > Support "Right Semi Join" plan shapes After looking a little closer, we've got: if (node->js.jointype == JOIN_RIGHT_SEMI && HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple))) continue; ... if (joinqual == NULL || ExecQual(joinqual, econtext)) { if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple))) HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)); As far as I can see, in a parallel hash join node->hj_CurTuple will be pointing into a shared hash table, so that this code is fooling with a shared HasMatch flag bit with no sort of lock whatsoever. This query doesn't have any joinqual at the PHJ level if I'm reading EXPLAIN correctly, so the window to get it wrong isn't very wide. Nonetheless, if two parallel workers inspect the same inner tuple at about the same time, this code would allow both of them to return it. I'm unsure if we've got any infrastructure that'd allow setting the tuple's match bit in a more atomic fashion. We might have to revert aa86129e1. regards, tom lane -
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Richard Guo <guofenglinux@gmail.com> — 2025-10-28T00:58:50Z
On Tue, Oct 28, 2025 at 5:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > As far as I can see, in a parallel hash join node->hj_CurTuple will > be pointing into a shared hash table, so that this code is fooling > with a shared HasMatch flag bit with no sort of lock whatsoever. > This query doesn't have any joinqual at the PHJ level if I'm reading > EXPLAIN correctly, so the window to get it wrong isn't very wide. > Nonetheless, if two parallel workers inspect the same inner tuple > at about the same time, this code would allow both of them to > return it. > > I'm unsure if we've got any infrastructure that'd allow setting the > tuple's match bit in a more atomic fashion. We might have to revert > aa86129e1. Thanks Lori for the report and test case. Thanks Tom for the analysis. I think you're right. In a parallel hash join, the inner relation is stored in a shared global hash table for probing, allowing multiple workers to access the same tuples concurrently. If two workers probe the same inner tuple at the same time, it can lead to duplicate emissions of that tuple, violating the semantics of a right semi join. AFAICT, there are 3 possible options for a fix. 1) Revert aa86129e1. 2) Modify the code to perform atomic operations on the matched flag using a CAS (or a similar) mechanism when running in parallel execution. 3) Disable parallel right semi joins in the planner. For option #2, it seems to me that it would require non-trivial changes, which might not be suitable for back-patching. We would also need to evaluate the performance impact of introducing atomic operations. I'm somewhat inclined toward option #3 instead. We can implement the disablement in hash_inner_and_outer(), which would only require a small modification (perhaps two lines, plus some comment updates). If option #2 doesn't work out, I'm open to option #1. (I'm still trying to understand why concurrent access to the matched flag in cases other than right semi joins (such as right or full joins) doesn't lead to concurrency issues.) - Richard
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Richard Guo <guofenglinux@gmail.com> — 2025-10-28T01:04:28Z
On Tue, Oct 28, 2025 at 9:58 AM Richard Guo <guofenglinux@gmail.com> wrote: > If option #2 doesn't work out, I'm open to option #1. Sorry, I meant if option #3 doesn't work out … - Richard
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-28T02:04:15Z
Richard Guo <guofenglinux@gmail.com> writes: > AFAICT, there are 3 possible options for a fix. > 1) Revert aa86129e1. > 2) Modify the code to perform atomic operations on the matched flag > using a CAS (or a similar) mechanism when running in parallel > execution. > 3) Disable parallel right semi joins in the planner. Right. I agree that #3 is the most attractive stopgap answer. We can look into #2 later, but it doesn't sound like something to back-patch. (The main problem according to my brief look is that t_infomask2 is uint16, but we haven't built out any 16-bit atomic primitives; perhaps they do not exist everywhere.) > (I'm still trying to understand why concurrent access to the matched > flag in cases other than right semi joins (such as right or full > joins) doesn't lead to concurrency issues.) I believe PRSJ is the only case where we need to set and concurrently inspect the HEAP_TUPLE_HAS_MATCH flag in a shared hashtable. I have a nasty feeling that this was well understood back when we first did parallel hash join, which is why it wasn't done already. Apparently the problem didn't get documented though, or at least not in any place you chanced to look. Looking at the code now, ExecParallelScanHashTableForUnmatched also has unprotected tests (not sets) of the flag, but I think that may be okay because we shouldn't still be mutating the flags while that runs. regards, tom lane
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Richard Guo <guofenglinux@gmail.com> — 2025-10-28T08:46:52Z
On Tue, Oct 28, 2025 at 11:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > AFAICT, there are 3 possible options for a fix. > > > 1) Revert aa86129e1. > > > 2) Modify the code to perform atomic operations on the matched flag > > using a CAS (or a similar) mechanism when running in parallel > > execution. > > > 3) Disable parallel right semi joins in the planner. > Right. I agree that #3 is the most attractive stopgap answer. > We can look into #2 later, but it doesn't sound like something > to back-patch. (The main problem according to my brief look > is that t_infomask2 is uint16, but we haven't built out any > 16-bit atomic primitives; perhaps they do not exist everywhere.) Agreed. Here's a patch that follows along the lines of option #3. > > (I'm still trying to understand why concurrent access to the matched > > flag in cases other than right semi joins (such as right or full > > joins) doesn't lead to concurrency issues.) > > I believe PRSJ is the only case where we need to set and concurrently > inspect the HEAP_TUPLE_HAS_MATCH flag in a shared hashtable. Right. In the case of RIGHT or FULL joins, the match flags are used to emit the unmatched inner tuples, and the scan for unmatched inner tuples occurs after we have finished a batch. Therefore, concurrent inspection and setting of the match flags does not break correctness. - Richard
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-28T23:07:34Z
Richard Guo <guofenglinux@gmail.com> writes: > On Tue, Oct 28, 2025 at 11:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right. I agree that #3 is the most attractive stopgap answer. >> We can look into #2 later, but it doesn't sound like something >> to back-patch. > Agreed. Here's a patch that follows along the lines of option #3. Code changes look good, and I confirm that I can't reproduce the failure anymore with this patch. I'm not convinced that the new regression test case is worth the cycles, at least not in this form. The main thing that's annoying me about it is creating/populating/analyzing its own large one-use table; that approach soon leads to regression suites that take forever. You could answer that objection by making use of some existing regression table, for instance this seems to work as well: explain select * from tenk1 t1 where exists(select 1 from tenk1 t2 where tenthous = t1.tenthous); However, I feel like it may still not be a great test, because it only shows that the planner *didn't* pick PRSJ, not that it *couldn't*. The cost differential between PRSJ with these settings and the Hash Semi Join plan that you get after applying the patch is not very great; so it's easy to imagine future changes that'd mean we'd not prefer PRSJ here anyway. But I'm not sure what we could do about that, so operationally this may be as good a test as we can get anyway. Another thought is that rather than having to remember to reset all those planner options, you could make the test a bit shorter and more maintainable by writing something like begin; set local parallel_setup_cost=0; set local ... explain ... rollback; regards, tom lane
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Richard Guo <guofenglinux@gmail.com> — 2025-10-29T01:46:35Z
On Wed, Oct 29, 2025 at 8:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Code changes look good, and I confirm that I can't reproduce the > failure anymore with this patch. Thanks for the review and confirmation. > I'm not convinced that the new regression test case is worth the > cycles, at least not in this form. The main thing that's annoying me > about it is creating/populating/analyzing its own large one-use table; > that approach soon leads to regression suites that take forever. Fair point. > You could answer that objection by making use of some existing > regression table, for instance this seems to work as well: > > explain select * from tenk1 t1 > where exists(select 1 from tenk1 t2 where tenthous = t1.tenthous); > > However, I feel like it may still not be a great test, because it only > shows that the planner *didn't* pick PRSJ, not that it *couldn't*. > The cost differential between PRSJ with these settings and the Hash > Semi Join plan that you get after applying the patch is not very > great; so it's easy to imagine future changes that'd mean we'd not > prefer PRSJ here anyway. But I'm not sure what we could do about > that, so operationally this may be as good a test as we can get > anyway. To make the right-semi join look more appealing, I wonder if we could apply a filter to t1 to make its output smaller than t2, so that the planner is more likely to choose t1 as the inner side for building the hash table. explain select * from tenk1 t1 where exists(select 1 from tenk1 t2 where fivethous = t1.fivethous) and t1.fivethous < 5; (I'm using fivethous instead of tenthous to avoid interference from index scan.) However, this doesn't seem to move the needle any further. The costs of PRSJ (unpatched) and PSJ (patched) are 755.67 and 777.54. The cost difference is still not very great. > Another thought is that rather than having to remember to reset all > those planner options, you could make the test a bit shorter and > more maintainable by writing something like > > begin; > set local parallel_setup_cost=0; > set local ... > > explain ... > > rollback; Brilliant! Thanks for the suggestion. - Richard
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Richard Guo <guofenglinux@gmail.com> — 2025-10-29T06:45:09Z
On Wed, Oct 29, 2025 at 10:46 AM Richard Guo <guofenglinux@gmail.com> wrote: > To make the right-semi join look more appealing, I wonder if we could > apply a filter to t1 to make its output smaller than t2, so that the > planner is more likely to choose t1 as the inner side for building the > hash table. > > explain select * from tenk1 t1 > where exists(select 1 from tenk1 t2 where fivethous = t1.fivethous) > and t1.fivethous < 5; > > (I'm using fivethous instead of tenthous to avoid interference from > index scan.) > > However, this doesn't seem to move the needle any further. The costs > of PRSJ (unpatched) and PSJ (patched) are 755.67 and 777.54. The cost > difference is still not very great. After spending a bit more time experimenting, I couldn't find a better query that shows a large difference between the costs of the PRSJ plan and the patched version plan. So I've updated the test case to use the aforementioned one. I also removed the reset statements and switched to the begin/rollback pattern for the test case. - Richard
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-29T16:00:13Z
Richard Guo <guofenglinux@gmail.com> writes: > After spending a bit more time experimenting, I couldn't find a better > query that shows a large difference between the costs of the PRSJ plan > and the patched version plan. So I've updated the test case to use > the aforementioned one. I also removed the reset statements and > switched to the begin/rollback pattern for the test case. v2 patch LGTM. regards, tom lane
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Richard Guo <guofenglinux@gmail.com> — 2025-10-30T03:24:49Z
On Thu, Oct 30, 2025 at 1:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > After spending a bit more time experimenting, I couldn't find a better > > query that shows a large difference between the costs of the PRSJ plan > > and the patched version plan. So I've updated the test case to use > > the aforementioned one. I also removed the reset statements and > > switched to the begin/rollback pattern for the test case. > v2 patch LGTM. Thanks for the review. I've pushed v2 and backpatched it to v18. Thanks to Lori for the report. - Richard
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Thomas Munro <thomas.munro@gmail.com> — 2025-10-30T03:43:44Z
On Tue, Oct 28, 2025 at 9:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > After looking a little closer, we've got: > > if (node->js.jointype == JOIN_RIGHT_SEMI && > HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple))) > continue; > ... > if (joinqual == NULL || ExecQual(joinqual, econtext)) > { > if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple))) > HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)); > > As far as I can see, in a parallel hash join node->hj_CurTuple will > be pointing into a shared hash table, so that this code is fooling > with a shared HasMatch flag bit with no sort of lock whatsoever. > This query doesn't have any joinqual at the PHJ level if I'm reading > EXPLAIN correctly, so the window to get it wrong isn't very wide. > Nonetheless, if two parallel workers inspect the same inner tuple > at about the same time, this code would allow both of them to > return it. Yeah, that way of writing to it was good enough for the original purpose: no backend would read it again until after synchronising on a barrier, it doesn't matter if two backends set it, and though the relaxed check if it's already set can be wrong, only in an acceptable direction. We only cared about which tuples were never matched, after the probe phase is finished. > I'm unsure if we've got any infrastructure that'd allow setting the > tuple's match bit in a more atomic fashion. Interesting problem. My first drive-by thought is that we would need: some new smaller atomics (that's a 16 bit member), and a decision that it is OK to use the atomics API by casting (should be OK, since we no longer have emulated atomics in smaller sizes, so now it's just a fancy way of accessing memory with "boxed" types that have the same size and alignment as the underlying type, and plain stores should work for initialisation if careful about synchronisation). I guess the op we'd want is atomic_fetch_or with a check of the old value to see if you lost the race. I suppose the smallest code change would be to keep the existing code flow that was apparently almost working here (the window of optimism being just between the relaxed check at the top and the set at the bottom). If you fail to be the first to set the bit, you abandon the work done, but it should hopefully be rare enough and cheaper overall than something "pessimistic" that does something akin to locking the tuple while thinking about it, and the time it took for this to be discovered seems to support the "optimistic" idea (I guess it depends how much processing happens to determine that it's really a match?). I expect it'd be very important to skip the atomic op when not needed. I don't know if it'd be worth a new specialisation to avoid the branching... IIRC previously we've chosen to set the flag in cases that don't need it on the basis that the branch to decide might be worse than just setting it always (which wasn't very scientific... it's also not nice to dirty memory for no reason, hence the check to avoid setting when already set), but now that we're talking about expensive memory synchronising ops we'd be forced to reconsider that, with a branch and optionally a specialisation to constant-fold it away (a technique we only use today to strip out the branches in the serial vs parallel versions of the function, pending more research on what other combinations are worth spending code size on, but this one was already on the candidate list to look into...). -
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Tom Lane <tgl@sss.pgh.pa.us> — 2025-10-30T04:11:39Z
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Oct 28, 2025 at 9:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm unsure if we've got any infrastructure that'd allow setting the >> tuple's match bit in a more atomic fashion. > Interesting problem. My first drive-by thought is that we would need: > some new smaller atomics (that's a 16 bit member), and a decision that > it is OK to use the atomics API by casting (should be OK, since we no > longer have emulated atomics in smaller sizes, so now it's just a > fancy way of accessing memory with "boxed" types that have the same > size and alignment as the underlying type, and plain stores should > work for initialisation if careful about synchronisation). Right. I wasn't excited about building out 16-bit atomics, not least because I'm unsure that those exist on every supported architecture. Instead I spent a little time thinking about how we could use a 32-bit atomic op here. Clearly, that should theoretically work, but you'd have to identify where is the start of the 32-bit word (t_infomask2 sadly is not at a 4-byte boundary) and which bit within that word is the target bit (that's gonna vary depending on endianness at least). Seems like a pain in the rear, but probably still less work than creating 16-bit atomic ops. A vaguer thought is that we're not bound to represent the match bit in exactly the way it's done now, if there's some other choice that would be easier to fit into these concerns. The only hard limit I'd lay down is that making the struct bigger isn't OK. > I guess > the op we'd want is atomic_fetch_or with a check of the old value to > see if you lost the race. I was thinking about CAS, but at some level these are all equivalent. > I suppose the smallest code change would be > to keep the existing code flow that was apparently almost working here > (the window of optimism being just between the relaxed check at the > top and the set at the bottom). If you fail to be the first to set > the bit, you abandon the work done, but it should hopefully be rare > enough and cheaper overall than something "pessimistic" that does > something akin to locking the tuple while thinking about it, Right. If we encounter the race condition and lose it, we'd have wasted the time for an extra evaluation of the joinqual, but that should be fine given that we know this is rare. (And we'd not be using this plan shape if the joinqual were volatile.) regards, tom lane
-
RE: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Lori Corbani <lori.corbani@jax.org> — 2025-10-30T09:57:43Z
Hi All, Does this mean that a v18 patch will be available that we can download? https://www.postgresql.org/ Many thanks for addressing this so quickly. Thank you. Lori -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Thursday, October 30, 2025 12:12 AM To: Thomas Munro <thomas.munro@gmail.com> Cc: Richard Guo <guofenglinux@gmail.com>; Lori Corbani <Lori.Corbani@jax.org>; pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Oct 28, 2025 at 9:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm unsure if we've got any infrastructure that'd allow setting the >> tuple's match bit in a more atomic fashion. > Interesting problem. My first drive-by thought is that we would need: > some new smaller atomics (that's a 16 bit member), and a decision that > it is OK to use the atomics API by casting (should be OK, since we no > longer have emulated atomics in smaller sizes, so now it's just a > fancy way of accessing memory with "boxed" types that have the same > size and alignment as the underlying type, and plain stores should > work for initialisation if careful about synchronisation). Right. I wasn't excited about building out 16-bit atomics, not least because I'm unsure that those exist on every supported architecture. Instead I spent a little time thinking about how we could use a 32-bit atomic op here. Clearly, that should theoretically work, but you'd have to identify where is the start of the 32-bit word (t_infomask2 sadly is not at a 4-byte boundary) and which bit within that word is the target bit (that's gonna vary depending on endianness at least). Seems like a pain in the rear, but probably still less work than creating 16-bit atomic ops. A vaguer thought is that we're not bound to represent the match bit in exactly the way it's done now, if there's some other choice that would be easier to fit into these concerns. The only hard limit I'd lay down is that making the struct bigger isn't OK. > I guess > the op we'd want is atomic_fetch_or with a check of the old value to > see if you lost the race. I was thinking about CAS, but at some level these are all equivalent. > I suppose the smallest code change would be to keep the existing code > flow that was apparently almost working here (the window of optimism > being just between the relaxed check at the top and the set at the > bottom). If you fail to be the first to set the bit, you abandon the > work done, but it should hopefully be rare enough and cheaper overall > than something "pessimistic" that does something akin to locking the > tuple while thinking about it, Right. If we encounter the race condition and lose it, we'd have wasted the time for an extra evaluation of the joinqual, but that should be fine given that we know this is rare. (And we'd not be using this plan shape if the joinqual were volatile.) regards, tom lane --- The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
John Naylor <johncnaylorls@gmail.com> — 2025-10-30T10:11:54Z
On Thu, Oct 30, 2025 at 4:57 PM Lori Corbani <Lori.Corbani@jax.org> wrote: > > Hi All, > > Does this mean that a v18 patch will be available that we can download? 18.1 is scheduled for November 13th. https://www.postgresql.org/developer/roadmap/ -- John Naylor Amazon Web Services
-
RE: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Lori Corbani <lori.corbani@jax.org> — 2025-10-30T10:13:26Z
Thank you! Lori -----Original Message----- From: John Naylor <johncnaylorls@gmail.com> Sent: Thursday, October 30, 2025 6:12 AM To: Lori Corbani <Lori.Corbani@jax.org> Cc: Tom Lane <tgl@sss.pgh.pa.us>; Thomas Munro <thomas.munro@gmail.com>; Richard Guo <guofenglinux@gmail.com>; pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results On Thu, Oct 30, 2025 at 4:57 PM Lori Corbani <Lori.Corbani@jax.org> wrote: > > Hi All, > > Does this mean that a v18 patch will be available that we can download? 18.1 is scheduled for November 13th. https://www.postgresql.org/developer/roadmap/ -- John Naylor Amazon Web Services --- The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Thomas Munro <thomas.munro@gmail.com> — 2025-10-30T10:35:50Z
On Thu, Oct 30, 2025 at 5:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Right. I wasn't excited about building out 16-bit atomics, not least > because I'm unsure that those exist on every supported architecture. > Instead I spent a little time thinking about how we could use a 32-bit > atomic op here. Clearly, that should theoretically work, but you'd > have to identify where is the start of the 32-bit word (t_infomask2 > sadly is not at a 4-byte boundary) and which bit within that word is > the target bit (that's gonna vary depending on endianness at least). > Seems like a pain in the rear, but probably still less work than > creating 16-bit atomic ops. I read that GCC and Clang will do exactly that for us automatically if we use builtins (generic-gcc.h) or eventually <stdatomic.h> on hardware without small atomics. AFAIK that's only RISC-V, which I don't have, so I tried cross-compiling void f(atomic_char *x) { atomic_fetch_or(x, 1); } and GCC happily generated a lr.w.aqrl, sc.w.rl sequence with a bunch of bitswizzling. With the right magic switches I think it should be able to spit out a single amoor.b instruction instead (looks like it's the "zaamo" extension that added sub-word atomics quite recently), but I couldn't immediately figure out how to turn it on. I think every other ISA on our list can do it in hardware except MIPS (RIP?). > A vaguer thought is that we're not bound to represent the match > bit in exactly the way it's done now, if there's some other choice > that would be easier to fit into these concerns. The only hard > limit I'd lay down is that making the struct bigger isn't OK. Yeah. Some alternative match flag storage sketches came up in the PHJ right join thread. I've wondered a few times about tagged pointers. What if we stole the lower two bits of the pointers to tuples to mean "matched" and "every tuple that follows me in the chain is also matched"? In right join unmatched scans, and clearly in this case too, we'd often avoid following pointers to matched tuples we're not interested in, but we'd have to mask the tagged bits before dereferencing. Otherwise I think it should be much like the earlier description, eg specialisation could remove all the match-tracking, masking and/or atomic ops depending on the plan. -
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
David Rowley <dgrowleyml@gmail.com> — 2025-10-30T10:40:09Z
On Thu, 30 Oct 2025 at 22:57, Lori Corbani <Lori.Corbani@jax.org> wrote: > Does this mean that a v18 patch will be available that we can download? The commit is in [1] if you're desperate for it sooner and don't mind patching/building yourself. The next batch of minor version releases is due out on the 13th of November [2]. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=257ee78341f2657d5c19cdaf0888f843e9bb0c33 [2] https://www.postgresql.org/developer/roadmap/
-
RE: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Lori Corbani <lori.corbani@jax.org> — 2025-10-30T10:43:15Z
David, Thank you, but we can wait for the version release on Nov 13. We are fine running Postgres 17 until then. No rush. Many thanks for your quick response and fix. You can remove me from this thread, if you like. Lori -----Original Message----- From: David Rowley <dgrowleyml@gmail.com> Sent: Thursday, October 30, 2025 6:40 AM To: Lori Corbani <Lori.Corbani@jax.org> Cc: Tom Lane <tgl@sss.pgh.pa.us>; Thomas Munro <thomas.munro@gmail.com>; Richard Guo <guofenglinux@gmail.com>; pgsql-bugs@lists.postgresql.org Subject: Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results On Thu, 30 Oct 2025 at 22:57, Lori Corbani <Lori.Corbani@jax.org> wrote: > Does this mean that a v18 patch will be available that we can download? The commit is in [1] if you're desperate for it sooner and don't mind patching/building yourself. The next batch of minor version releases is due out on the 13th of November [2]. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=257ee78341f2657d5c19cdaf0888f843e9bb0c33 [2] https://www.postgresql.org/developer/roadmap/ --- The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
-
Re: [EXTERNAL]Re: BUG #19094: select statement on postgres 17 vs postgres 18 is returning different/duplicate results
Thomas Munro <thomas.munro@gmail.com> — 2025-11-11T03:39:17Z
On Thu, Oct 30, 2025 at 5:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Right. I wasn't excited about building out 16-bit atomics, not least > because I'm unsure that those exist on every supported architecture. > Instead I spent a little time thinking about how we could use a 32-bit > atomic op here. Clearly, that should theoretically work, but you'd > have to identify where is the start of the 32-bit word (t_infomask2 > sadly is not at a 4-byte boundary) and which bit within that word is > the target bit (that's gonna vary depending on endianness at least). > Seems like a pain in the rear, but probably still less work than > creating 16-bit atomic ops. Here's a quick patch to experiment with that idea. It applies on top of the <stdatomic.h> patches I posted yesterday[1]. The following looks quite nice to my eye, but there might be other ways that make fewer assumptions (see XXX comments in patch): /* * Atomically set the match flag and report whether it was already set. False * means that the caller was the first to set it. */ static inline bool HeapTupleHeaderTestAndSetMatch(MinimalTupleData *tup) { return atomic_fetch_or(pg_atomic_cast(&tup->t_infomask2), HEAP_TUPLE_HAS_MATCH) & HEAP_TUPLE_HAS_MATCH; } [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGKFvu3zyvv3aaj5hHs9VtWcjFAmisOwOc7aOZNc5AF3NA%40mail.gmail.com