Thread

Re: Removing unneeded self joins

Jaime Casanova <jcasanov@systemguards.com.ec> — 2022-03-01T00:09:49Z
On Thu, Jul 15, 2021 at 05:49:11PM +0300, Andrey Lepikhov wrote:
> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of partitioned
> tables, for example, the overhead could grow. But in the context of overall
> planning time this overhead will be small till the large number of
> relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
> 

I made some tests in a machine with 16 cores and 32GB of RAM.
So we can see if this is an improvement.

This is what I found:

+-----------------------+----------+-----------+-----------+-------+-----------+-------+
|         test          |   mode   |  master   |  enabled  |   %   | disabled  |   %   |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read only     | standard |  64418.13 |  63942.94 | -0.74 |  62231.38 | -3.39 |
| pgbench read only     | prepared | 108463.51 | 107002.13 | -1.35 | 100960.83 | -6.92 |
| pgbench read only     | extended |  55409.65 |  56427.63 |  1.84 |  55927.62 |  0.93 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| pgbench read/write    | standard |   9374.91 |   9135.21 | -2.56 |   8840.68 | -5.70 |
| pgbench read/write    | prepared |  11849.86 |  11672.23 | -1.50 |  11393.39 | -3.85 |
| pgbench read/write    | extended |   7976.80 |   7947.07 | -0.37 |   7788.99 | -2.35 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select non optimize 1 | standard |     80.97 |     81.29 |  0.40 |     81.30 |  0.41 |
| select non optimize 1 | prepared |     81.29 |     81.28 | -0.01 |     80.89 | -0.49 |
| select non optimize 1 | extended |     81.07 |     80.81 | -0.32 |     80.98 | -0.11 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 1    | standard |     15.84 |     13.90 |-12.25 |     15.80 | -0.25 |
| select optimized 1    | prepared |     15.24 |     13.82 | -9.32 |     15.55 |  2.03 |
| select optimized 1    | extended |     15.38 |     13.89 | -9.69 |     15.59 |  1.37 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select optimized 2    | standard |  10204.91 |  10818.39 |  6.01 |  10261.07 |  0.55 |
| select optimized 2    | prepared |  13284.06 |  15579.33 | 17.28 |  13116.22 | -1.26 |
| select optimized 2    | extended |  10143.43 |  10645.23 |  4.95 |  10142.77 | -0.01 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+
| select shoe           | standard |   5645.28 |   5661.71 |  0.29 |   6180.60 |  9.48 |
| select shoe           | prepared |   9660.45 |   9602.37 | -0.60 |   9894.82 |  2.43 |
| select shoe           | extended |   5666.47 |   5634.10 | -0.57 |   5757.26 |  1.60 |
+-----------------------+----------+-----------+-----------+-------+-----------+-------+

Obviously the pgbench runs are from the standard script. The numbers are
not clear for me, I can see improvementes with the patch only in one
case and, for some reason, if I disable the patch
(enable_self_join_removal='off') I still see a regression in normal
cases and curiosly an improvement in one case.

I'm attaching the queries. I used the users table that is down-thread
and loaded with ~200k rows using:

insert into users 
select seq, case when random() < 0.2 then null else random() * 1000 end, 
       random() * 10000 
  from generate_series(1, 1000000) seq 
  on conflict (nullable_int) do nothing;

for master I just dumped the data from the table and loaded it. I'm also
attaching the queries I used.

After this tests, I'm not convinced this is actually providing something
performance-wise. At least not in its current state.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL