Thread

  1. Re: Removing unneeded self joins

    Jaime Casanova <jcasanov@systemguards.com.ec> — 2022-03-01T00:09:49Z

    On Thu, Jul 15, 2021 at 05:49:11PM +0300, Andrey Lepikhov wrote:
    > On 6/7/21 13:49, Hywel Carver wrote:
    > > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
    > > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
    > > Looking through the email chain, a previous version of this patch added
    > > ~0.6% to planning time in the worst case tested - does that meet the
    > > "essentially free" requirement?
    > I think these tests weren't full coverage of possible use cases. It will
    > depend on a number of relations in the query. For the JOIN of partitioned
    > tables, for example, the overhead could grow. But in the context of overall
    > planning time this overhead will be small till the large number of
    > relations.
    > Also, we made this feature optional to solve possible problems.
    > Rebased on 768ea9bcf9
    > 
    
    I made some tests in a machine with 16 cores and 32GB of RAM.
    So we can see if this is an improvement.
    
    This is what I found:
    
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    |         test          |   mode   |  master   |  enabled  |   %   | disabled  |   %   |
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    | pgbench read only     | standard |  64418.13 |  63942.94 | -0.74 |  62231.38 | -3.39 |
    | pgbench read only     | prepared | 108463.51 | 107002.13 | -1.35 | 100960.83 | -6.92 |
    | pgbench read only     | extended |  55409.65 |  56427.63 |  1.84 |  55927.62 |  0.93 |
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    | pgbench read/write    | standard |   9374.91 |   9135.21 | -2.56 |   8840.68 | -5.70 |
    | pgbench read/write    | prepared |  11849.86 |  11672.23 | -1.50 |  11393.39 | -3.85 |
    | pgbench read/write    | extended |   7976.80 |   7947.07 | -0.37 |   7788.99 | -2.35 |
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    | select non optimize 1 | standard |     80.97 |     81.29 |  0.40 |     81.30 |  0.41 |
    | select non optimize 1 | prepared |     81.29 |     81.28 | -0.01 |     80.89 | -0.49 |
    | select non optimize 1 | extended |     81.07 |     80.81 | -0.32 |     80.98 | -0.11 |
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    | select optimized 1    | standard |     15.84 |     13.90 |-12.25 |     15.80 | -0.25 |
    | select optimized 1    | prepared |     15.24 |     13.82 | -9.32 |     15.55 |  2.03 |
    | select optimized 1    | extended |     15.38 |     13.89 | -9.69 |     15.59 |  1.37 |
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    | select optimized 2    | standard |  10204.91 |  10818.39 |  6.01 |  10261.07 |  0.55 |
    | select optimized 2    | prepared |  13284.06 |  15579.33 | 17.28 |  13116.22 | -1.26 |
    | select optimized 2    | extended |  10143.43 |  10645.23 |  4.95 |  10142.77 | -0.01 |
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    | select shoe           | standard |   5645.28 |   5661.71 |  0.29 |   6180.60 |  9.48 |
    | select shoe           | prepared |   9660.45 |   9602.37 | -0.60 |   9894.82 |  2.43 |
    | select shoe           | extended |   5666.47 |   5634.10 | -0.57 |   5757.26 |  1.60 |
    +-----------------------+----------+-----------+-----------+-------+-----------+-------+
    
    Obviously the pgbench runs are from the standard script. The numbers are
    not clear for me, I can see improvementes with the patch only in one
    case and, for some reason, if I disable the patch
    (enable_self_join_removal='off') I still see a regression in normal
    cases and curiosly an improvement in one case.
    
    I'm attaching the queries. I used the users table that is down-thread
    and loaded with ~200k rows using:
    
    insert into users 
    select seq, case when random() < 0.2 then null else random() * 1000 end, 
           random() * 10000 
      from generate_series(1, 1000000) seq 
      on conflict (nullable_int) do nothing;
    
    for master I just dumped the data from the table and loaded it. I'm also
    attaching the queries I used.
    
    After this tests, I'm not convinced this is actually providing something
    performance-wise. At least not in its current state.
    
    -- 
    Jaime Casanova
    Director de Servicios Profesionales
    SystemGuards - Consultores de PostgreSQL