Re: Pathify RHS unique-ification for semijoin planning

Richard Guo <guofenglinux@gmail.com>

From: Richard Guo <guofenglinux@gmail.com>
To: Álvaro Herrera <alvherre@kurilemu.de>
Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>, Tom Lane <tgl@sss.pgh.pa.us>, Andy Fan <zhihuifan1213@163.com>, wenhui qiu <qiuwenhuifx@gmail.com>
Date: 2025-09-02T10:10:23Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Simplify relation_has_unique_index_for()

  2. Pathify RHS unique-ification for semijoin planning

  3. Convert varatt.h access macros to static inline functions.

  4. Re-export a few of createplan.c's make_xxx() functions.

On Wed, Jul 23, 2025 at 5:11 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> As a very trivial test on this patch, I ran the query in your opening
> email, both with and without the patch, scaling up the size of the table
> a little bit.

> This is a really nice improvement.  I think we could find queries that
> are arbitrarily faster, by feeding enough tuples to the unnecessary Sort
> nodes.

FWIW, I'm looking for a query to better showcase the performance
improvement from this patch.  Here is one I found.

create table t (a int, b int);
insert into t select i%10, i%10 from generate_series(1,50000) i;
create index on t (a, b);
analyze t;

explain (analyze, costs on)
select * from t t1, t t2 where (t1.a, t2.b) in (select a, b from t t3)
order by t1.a, t2.b;

Here are the planning and execution time on my snail-paced machine
(best of 3), without and with this patch.

-- without this patch
 Planning Time: 0.850 ms
 Execution Time: 108149.907 ms

-- with this patch
 Planning Time: 0.728 ms
 Execution Time: 29229.748 ms

So this specific case runs about 3.7 times faster, which is really
nice.

- Richard