Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Move into separate file all the SQL queries used in pg_upgrade tests

  2. Add table to regression tests for binary-compatibility checks in pg_upgrade

  3. Fix tests of pg_upgrade across different major versions

  4. Multirange datatypes

  5. Work around cross-version-upgrade issues created by commit 9e38c2bb5.

  6. Declare assorted array functions using anycompatible not anyelement.

  7. Remove factorial operators, leaving only the factorial() function.

  8. Create by default sql/ and expected/ for output directory in pg_regress

  9. Add missing include to pg_upgrade/version.c

  10. Improve the check for pg_catalog.line data type in pg_upgrade

  11. Improve the check for pg_catalog.unknown data type in pg_upgrade

  12. Check for tables with sql_identifier during pg_upgrade

  13. pg_upgrade: clarify the database names in error files

  14. In the pg_upgrade test suite, don't write to src/test/regress.

  15. Allow group access on PGDATA

  16. Refactor dir/file permissions

  17. Remove unused functions in regress.c.

  18. Make WAL segment size configurable at initdb time.

  19. Fix bit-rot in pg_upgrade's test.sh, and improve documentation.

  1. BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    PG Bug reporting form <noreply@postgresql.org> — 2019-10-08T17:08:53Z

    The following bug has been logged on the website:
    
    Bug reference:      16045
    Logged by:          Hans Buschmann
    Email address:      buschmann@nidsa.net
    PostgreSQL version: 12.0
    Operating system:   Windows 10 64bit
    Description:        
    
    I just did a pg_upgrade from pg 11.5 to pg 12.0 on my development machine
    under Windows 64bit (both distributions from EDB).
    
    cpsdb=# select version ();
                              version
    ------------------------------------------------------------
     PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit
    (1 row)
    
    The pg_upgrade with --link went flawlessly, I started (only!) the new server
    12.0 and could connect and access individual databases.
    
    As recommended by the resulting analyze_new_cluster.bat I tried a full
    vacuumdb with:
    
    "N:/pgsql/bin/vacuumdb" -U postgres --all --analyze-only
    
    which crashed with
    vacuumdb: vacuuming database "cpsdb"
    vacuumdb: error: vacuuming of table "admin.q_tbl_archiv" in database "cpsdb"
    failed: ERROR:  compressed data is corrupted
    
    I connected to the database through pgsql and looked at the table
    "admin.q_tbl_archiv"
    
    cpsdb=# \d+ q_tbl_archiv;
                                                       Table
    "admin.q_tbl_archiv"
          Column      |                Type                | Collation |
    Nullable | Default | Storage  | Stats target | Description
    ------------------+------------------------------------+-----------+----------+---------+----------+--------------+-------------
     table_name       | information_schema.sql_identifier  |           |        
     |         | plain    |              |
     column_name      | information_schema.sql_identifier  |           |        
     |         | plain    |              |
     ordinal_position | information_schema.cardinal_number |           |        
     |         | plain    |              |
     col_qualifier    | text                               |           |        
     |         | extended |              |
     id_column        | information_schema.sql_identifier  |           |        
     |         | plain    |              |
     id_default       | information_schema.character_data  |           |        
     |         | extended |              |
    Access method: heap
    
    When trying to select * from q_tbl_archiv I got:
    
    cpsdb=# select * from q_tbl_archiv;
    ERROR:  invalid memory alloc request size 18446744073709551613
    
    This table was created a long time back under 9.5 or 9.6 with the (here
    truncated) following command:
    
    
    create table q_tbl_archiv as
    with
    qseason as (
    select table_name,column_name, ordinal_position 
    ,replace(column_name,'_season','') as col_qualifier
    -- ,'id_'||replace(column_name,'_season','') as id_column
    from information_schema.columns 
    where 
    column_name like '%_season'
    and ordinal_position < 10
    and table_name in (
     'table1'
    ,'table2'
    -- here truncated:
    -- ... (here where all table of mine having columns like xxx_season)
    -- to reproduce, change to your own tablenames in a test database
    )
    order by table_name
    )
    select qs.*,c.column_name as id_column, c.column_default as id_default
    from 
    	qseason qs
    	left join information_schema.columns c on c.table_name=qs.table_name and
    c.column_name like 'id_%'
    ;
    
    Until now this table was always restored without error by migrating to a new
    major version through pg_dump/initdb/pr_restore.
    
    To verify the integrity of the table I restored the dump taken under pg_dump
    from pg 11.5 just before the pg_upgrade to another machine.
    
    The restore and analyze went OK and select * from q_tbl_archiv showed all
    tuples, eg (edited):
    
    cpsdb_dev=# select * from q_tbl_archiv;
            table_name        | column_name  | ordinal_position | col_qualifier
    | id_column |                        id_default
    --------------------------+--------------+------------------+---------------+-----------+----------------------------------------------------------
     table1                   | chm_season   |                2 | chm          
    |           |
     table2                   | cs_season    |                2 | cs           
    | id_cs     | nextval('table2_id_cs_seq'::regclass)
    ...
    
    In conclusion, this seems to me like an error/omission of pg_upgrade.
    
    It seems to handle these specially derived tables from information_schema
    not correctly, resulting in failures of the upgraded database.
    
    For me, this error is not so crucial, because this table is only used for
    administrative purposes and can easily be restored from backup.
    
    But I want to share my findings for the sake of other users of pg_upgrade.
    
    Thanks for investigating!
    
    Hans Buschmann
    
    
  2. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T13:24:36Z

    On Tue, Oct 08, 2019 at 05:08:53PM +0000, PG Bug reporting form wrote:
    >The following bug has been logged on the website:
    >
    >Bug reference:      16045
    >Logged by:          Hans Buschmann
    >Email address:      buschmann@nidsa.net
    >PostgreSQL version: 12.0
    >Operating system:   Windows 10 64bit
    >Description:
    >
    >I just did a pg_upgrade from pg 11.5 to pg 12.0 on my development machine
    >under Windows 64bit (both distributions from EDB).
    >
    >cpsdb=# select version ();
    >                          version
    >------------------------------------------------------------
    > PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit
    >(1 row)
    >
    >The pg_upgrade with --link went flawlessly, I started (only!) the new server
    >12.0 and could connect and access individual databases.
    >
    >As recommended by the resulting analyze_new_cluster.bat I tried a full
    >vacuumdb with:
    >
    >"N:/pgsql/bin/vacuumdb" -U postgres --all --analyze-only
    >
    >which crashed with
    >vacuumdb: vacuuming database "cpsdb"
    >vacuumdb: error: vacuuming of table "admin.q_tbl_archiv" in database "cpsdb"
    >failed: ERROR:  compressed data is corrupted
    >
    >I connected to the database through pgsql and looked at the table
    >"admin.q_tbl_archiv"
    >
    >cpsdb=# \d+ q_tbl_archiv;
    >                                                   Table
    >"admin.q_tbl_archiv"
    >      Column      |                Type                | Collation |
    >Nullable | Default | Storage  | Stats target | Description
    >------------------+------------------------------------+-----------+----------+---------+----------+--------------+-------------
    > table_name       | information_schema.sql_identifier  |           |
    > |         | plain    |              |
    > column_name      | information_schema.sql_identifier  |           |
    > |         | plain    |              |
    > ordinal_position | information_schema.cardinal_number |           |
    > |         | plain    |              |
    > col_qualifier    | text                               |           |
    > |         | extended |              |
    > id_column        | information_schema.sql_identifier  |           |
    > |         | plain    |              |
    > id_default       | information_schema.character_data  |           |
    > |         | extended |              |
    >Access method: heap
    >
    >When trying to select * from q_tbl_archiv I got:
    >
    >cpsdb=# select * from q_tbl_archiv;
    >ERROR:  invalid memory alloc request size 18446744073709551613
    >
    >This table was created a long time back under 9.5 or 9.6 with the (here
    >truncated) following command:
    >
    >
    >create table q_tbl_archiv as
    >with
    >qseason as (
    >select table_name,column_name, ordinal_position
    >,replace(column_name,'_season','') as col_qualifier
    >-- ,'id_'||replace(column_name,'_season','') as id_column
    >from information_schema.columns
    >where
    >column_name like '%_season'
    >and ordinal_position < 10
    >and table_name in (
    > 'table1'
    >,'table2'
    >-- here truncated:
    >-- ... (here where all table of mine having columns like xxx_season)
    >-- to reproduce, change to your own tablenames in a test database
    >)
    >order by table_name
    >)
    >select qs.*,c.column_name as id_column, c.column_default as id_default
    >from
    >	qseason qs
    >	left join information_schema.columns c on c.table_name=qs.table_name and
    >c.column_name like 'id_%'
    >;
    >
    >Until now this table was always restored without error by migrating to a new
    >major version through pg_dump/initdb/pr_restore.
    >
    >To verify the integrity of the table I restored the dump taken under pg_dump
    >from pg 11.5 just before the pg_upgrade to another machine.
    >
    >The restore and analyze went OK and select * from q_tbl_archiv showed all
    >tuples, eg (edited):
    >
    >cpsdb_dev=# select * from q_tbl_archiv;
    >        table_name        | column_name  | ordinal_position | col_qualifier
    >| id_column |                        id_default
    >--------------------------+--------------+------------------+---------------+-----------+----------------------------------------------------------
    > table1                   | chm_season   |                2 | chm
    >|           |
    > table2                   | cs_season    |                2 | cs
    >| id_cs     | nextval('table2_id_cs_seq'::regclass)
    >...
    >
    >In conclusion, this seems to me like an error/omission of pg_upgrade.
    >
    
    There's clearly something bad happening. It's a bit strange, though. Had
    this been a data corruption issue, I'd expect the pg_dump to fail too,
    but it succeeds.
    
    >It seems to handle these specially derived tables from information_schema
    >not correctly, resulting in failures of the upgraded database.
    >
    
    Well, I don't see how that should make any difference. It's a CTAS and
    that should create a regular table, that's not an issue. I wonder if
    there were some changes to the data types involved, but that would be
    essentially a break in on-disk format and we're careful about not doing
    that ...
    
    >For me, this error is not so crucial, because this table is only used for
    >administrative purposes and can easily be restored from backup.
    >
    >But I want to share my findings for the sake of other users of pg_upgrade.
    >
    
    OK, thanks. Could you maybe set
    
     log_error_verbosity = verbose
    
    before invoking the vacuum (you can set that in that session)? That
    should give us more details about where exactly the error is triggered.
    Even better, if you could attach a debugger to the session, set
    breakpoints on locations triggering 'invalid memory alloc request size'
    and then show the backtrace (obviously, that's more complicated).
    
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  3. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T13:59:07Z

    FWIW I can reproduce this - it's enough to do this on the 11 cluster
    
    create table q_tbl_archiv as
    with
    qseason as (
    select table_name,column_name, ordinal_position
    ,replace(column_name,'_season','') as col_qualifier
    -- ,'id_'||replace(column_name,'_season','') as id_column
    from information_schema.columns
    order by table_name
    )
    select qs.*,c.column_name as id_column, c.column_default as id_default
    from
            qseason qs
            left join information_schema.columns c on c.table_name=qs.table_name and
    c.column_name like 'id_%';
    
    
    and then
    
        analyze q_tbl_archiv
    
    which produces backtrace like this:
    
    No symbol "stats" in current context.
    (gdb) bt
    #0  0x0000746095262951 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
    #1  0x0000000000890a8e in varstrfastcmp_locale (a1p=0x17716b4 "per_language\a", len1=<optimized out>, a2p=0x176af28 '\177' <repeats 136 times>, "\021\004", len2=-4, ssup=<optimized out>, ssup=<optimized out>) at varlena.c:2320
    #2  0x0000000000890cb1 in varlenafastcmp_locale (x=24581808, y=24555300, ssup=0x7ffc649463f0) at varlena.c:2219
    #3  0x00000000005b73b4 in ApplySortComparator (ssup=0x7ffc649463f0, isNull2=false, datum2=<optimized out>, isNull1=false, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224
    #4  compare_scalars (a=<optimized out>, b=<optimized out>, arg=0x7ffc649463e0) at analyze.c:2700
    #5  0x00000000008f9953 in qsort_arg (a=a@entry=0x178fdc0, n=<optimized out>, n@entry=2158, es=es@entry=16, cmp=cmp@entry=0x5b7390 <compare_scalars>, arg=arg@entry=0x7ffc649463e0) at qsort_arg.c:140
    #6  0x00000000005b86a6 in compute_scalar_stats (stats=0x176a208, fetchfunc=<optimized out>, samplerows=<optimized out>, totalrows=2158) at analyze.c:2273
    #7  0x00000000005b9d95 in do_analyze_rel (onerel=onerel@entry=0x74608c00d3e8, params=params@entry=0x7ffc64946970, va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>, relpages=22, inh=inh@entry=false, in_outer_xact=false, elevel=13)
        at analyze.c:529
    #8  0x00000000005bb2c9 in analyze_rel (relid=<optimized out>, relation=<optimized out>, params=params@entry=0x7ffc64946970, va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:260
    #9  0x000000000062c7b0 in vacuum (relations=0x1727120, params=params@entry=0x7ffc64946970, bstrategy=<optimized out>, bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=true) at vacuum.c:413
    #10 0x000000000062cd49 in ExecVacuum (pstate=pstate@entry=0x16c9518, vacstmt=vacstmt@entry=0x16a82b8, isTopLevel=isTopLevel@entry=true) at vacuum.c:199
    #11 0x00000000007a6d64 in standard_ProcessUtility (pstmt=0x16a8618, queryString=0x16a77a8 "", context=<optimized out>, params=0x0, queryEnv=0x0, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at utility.c:670
    #12 0x00000000007a4006 in PortalRunUtility (portal=0x170f368, pstmt=0x16a8618, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at pquery.c:1175
    #13 0x00000000007a4b61 in PortalRunMulti (portal=portal@entry=0x170f368, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710,
        completionTag=completionTag@entry=0x7ffc64946cb0 "") at pquery.c:1321
    #14 0x00000000007a5864 in PortalRun (portal=portal@entry=0x170f368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710,
        completionTag=0x7ffc64946cb0 "") at pquery.c:796
    #15 0x00000000007a174e in exec_simple_query (query_string=0x16a77a8 "") at postgres.c:1215
    
    Looking at compute_scalar_stats, the "stats" parameter does not seem
    particularly healthy:
    
    (gdb) p *stats
    $3 = {attr = 0x10, attrtypid = 12, attrtypmod = 0, attrtype = 0x1762e00, attrcollid = 356, anl_context = 0x7f7f7f7e00000000, compute_stats = 0x100, minrows = 144, extra_data = 0x1762e00, stats_valid = false, stanullfrac = 0,
      stawidth = 0, stadistinct = 0, stakind = {0, 0, 0, 0, 0}, staop = {0, 0, 0, 0, 0}, stacoll = {0, 0, 0, 0, 0}, numnumbers = {0, 0, 0, 0, 0}, stanumbers = {0x0, 0x0, 0x0, 0x0, 0x0}, numvalues = {0, 0, 0, 0, 2139062142}, stavalues = {
        0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f}, statypid = {2139062143, 2139062143, 2139062143, 2139062143, 2139062143}, statyplen = {32639, 32639, 32639, 32639, 32639},
      statypbyval = {127, 127, 127, 127, 127}, statypalign = "\177\177\177\177\177", tupattnum = 2139062143, rows = 0x7f7f7f7f7f7f7f7f, tupDesc = 0x7f7f7f7f7f7f7f7f, exprvals = 0x8, exprnulls = 0x4, rowstride = 24522240}
    
    Not sure about the root cause yet.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  4. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-09T14:07:01Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > FWIW I can reproduce this - it's enough to do this on the 11 cluster
    
    I failed to reproduce any problem from your example, but I was trying
    in C locale on a Linux machine.  What environment are you testing?
    
    			regards, tom lane
    
    
    
    
  5. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T14:18:41Z

    On Wed, Oct 09, 2019 at 03:59:07PM +0200, Tomas Vondra wrote:
    >FWIW I can reproduce this - it's enough to do this on the 11 cluster
    >
    >create table q_tbl_archiv as
    >with
    >qseason as (
    >select table_name,column_name, ordinal_position
    >,replace(column_name,'_season','') as col_qualifier
    >-- ,'id_'||replace(column_name,'_season','') as id_column
    >from information_schema.columns
    >order by table_name
    >)
    >select qs.*,c.column_name as id_column, c.column_default as id_default
    >from
    >       qseason qs
    >       left join information_schema.columns c on c.table_name=qs.table_name and
    >c.column_name like 'id_%';
    >
    >
    >and then
    >
    >   analyze q_tbl_archiv
    >
    >which produces backtrace like this:
    >
    >No symbol "stats" in current context.
    >(gdb) bt
    >#0  0x0000746095262951 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
    >#1  0x0000000000890a8e in varstrfastcmp_locale (a1p=0x17716b4 "per_language\a", len1=<optimized out>, a2p=0x176af28 '\177' <repeats 136 times>, "\021\004", len2=-4, ssup=<optimized out>, ssup=<optimized out>) at varlena.c:2320
    >#2  0x0000000000890cb1 in varlenafastcmp_locale (x=24581808, y=24555300, ssup=0x7ffc649463f0) at varlena.c:2219
    >#3  0x00000000005b73b4 in ApplySortComparator (ssup=0x7ffc649463f0, isNull2=false, datum2=<optimized out>, isNull1=false, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224
    >#4  compare_scalars (a=<optimized out>, b=<optimized out>, arg=0x7ffc649463e0) at analyze.c:2700
    >#5  0x00000000008f9953 in qsort_arg (a=a@entry=0x178fdc0, n=<optimized out>, n@entry=2158, es=es@entry=16, cmp=cmp@entry=0x5b7390 <compare_scalars>, arg=arg@entry=0x7ffc649463e0) at qsort_arg.c:140
    >#6  0x00000000005b86a6 in compute_scalar_stats (stats=0x176a208, fetchfunc=<optimized out>, samplerows=<optimized out>, totalrows=2158) at analyze.c:2273
    >#7  0x00000000005b9d95 in do_analyze_rel (onerel=onerel@entry=0x74608c00d3e8, params=params@entry=0x7ffc64946970, va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>, relpages=22, inh=inh@entry=false, in_outer_xact=false, elevel=13)
    >   at analyze.c:529
    >#8  0x00000000005bb2c9 in analyze_rel (relid=<optimized out>, relation=<optimized out>, params=params@entry=0x7ffc64946970, va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:260
    >#9  0x000000000062c7b0 in vacuum (relations=0x1727120, params=params@entry=0x7ffc64946970, bstrategy=<optimized out>, bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=true) at vacuum.c:413
    >#10 0x000000000062cd49 in ExecVacuum (pstate=pstate@entry=0x16c9518, vacstmt=vacstmt@entry=0x16a82b8, isTopLevel=isTopLevel@entry=true) at vacuum.c:199
    >#11 0x00000000007a6d64 in standard_ProcessUtility (pstmt=0x16a8618, queryString=0x16a77a8 "", context=<optimized out>, params=0x0, queryEnv=0x0, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at utility.c:670
    >#12 0x00000000007a4006 in PortalRunUtility (portal=0x170f368, pstmt=0x16a8618, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at pquery.c:1175
    >#13 0x00000000007a4b61 in PortalRunMulti (portal=portal@entry=0x170f368, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710,
    >   completionTag=completionTag@entry=0x7ffc64946cb0 "") at pquery.c:1321
    >#14 0x00000000007a5864 in PortalRun (portal=portal@entry=0x170f368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710,
    >   completionTag=0x7ffc64946cb0 "") at pquery.c:796
    >#15 0x00000000007a174e in exec_simple_query (query_string=0x16a77a8 "") at postgres.c:1215
    >
    >Looking at compute_scalar_stats, the "stats" parameter does not seem
    >particularly healthy:
    >
    >(gdb) p *stats
    >$3 = {attr = 0x10, attrtypid = 12, attrtypmod = 0, attrtype = 0x1762e00, attrcollid = 356, anl_context = 0x7f7f7f7e00000000, compute_stats = 0x100, minrows = 144, extra_data = 0x1762e00, stats_valid = false, stanullfrac = 0,
    > stawidth = 0, stadistinct = 0, stakind = {0, 0, 0, 0, 0}, staop = {0, 0, 0, 0, 0}, stacoll = {0, 0, 0, 0, 0}, numnumbers = {0, 0, 0, 0, 0}, stanumbers = {0x0, 0x0, 0x0, 0x0, 0x0}, numvalues = {0, 0, 0, 0, 2139062142}, stavalues = {
    >   0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f}, statypid = {2139062143, 2139062143, 2139062143, 2139062143, 2139062143}, statyplen = {32639, 32639, 32639, 32639, 32639},
    > statypbyval = {127, 127, 127, 127, 127}, statypalign = "\177\177\177\177\177", tupattnum = 2139062143, rows = 0x7f7f7f7f7f7f7f7f, tupDesc = 0x7f7f7f7f7f7f7f7f, exprvals = 0x8, exprnulls = 0x4, rowstride = 24522240}
    >
    >Not sure about the root cause yet.
    >
    
    OK, a couple more observations - the table schema looks like this:
    
                                  Table "public.q_tbl_archiv"
          Column      |                Type                | Collation | Nullable | Default 
    ------------------+------------------------------------+-----------+----------+---------
     table_name       | information_schema.sql_identifier  |           |          | 
     column_name      | information_schema.sql_identifier  |           |          | 
     ordinal_position | information_schema.cardinal_number |           |          | 
     col_qualifier    | text                               |           |          | 
     id_column        | information_schema.sql_identifier  |           |          | 
     id_default       | information_schema.character_data  |           |          | 
    
    and I can succesfully do this:
    
      test=# analyze q_tbl_archiv (table_name, column_name, ordinal_position, id_column, id_default);
      ANALYZE
    
    but as soon as I include the col_qualifier column, it fails:
    
      test=# analyze q_tbl_archiv (table_name, column_name, ordinal_position, id_column, id_default, col_qualifier);
      ERROR:  compressed data is corrupted
    
    But it fails differently (with the segfault) when analyzing just the one
    column:
    
      test=# analyze q_tbl_archiv (col_qualifier);
      server closed the connection unexpectedly
      	This probably means the server terminated abnormally
      	before or while processing the request.
      The connection to the server was lost. Attempting reset: Succeeded.
    
    Moreover, there are some other interesting failures - I can do
    
      select max(table_name) from q_tbl_archiv;
      select max(column_name) from q_tbl_archiv;
      select max(ordinal_position) from q_tbl_archiv;
    
    but as soon as I try doing that with col_qualifier, it crashes and
    burns:
    
      select max(col_qualifier) from q_tbl_archiv;
    
    The backtrace is rather strange in this case (a lot of missing calls,
    etc.). However, when called for the next two columns, it still crashes,
    but the backtraces look somewhat saner:
    
      select max(id_column) from q_tbl_archiv;
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00007db3186c6617 in __strlen_avx2 () from /lib64/libc.so.6
    (gdb) bt
    #0  0x00007db3186c6617 in __strlen_avx2 () from /lib64/libc.so.6
    #1  0x0000000000894ced in cstring_to_text (s=0x7db32ce38935 <error: Cannot access memory at address 0x7db32ce38935>) at varlena.c:173
    #2  name_text (fcinfo=<optimized out>) at varlena.c:3573
    #3  0x000000000063860d in ExecInterpExpr (state=0x1136900, econtext=0x1135128, isnull=<optimized out>) at execExprInterp.c:649
    #4  0x000000000064f699 in ExecEvalExprSwitchContext (isNull=0x7ffcfd8f3b2f, econtext=<optimized out>, state=<optimized out>) at ../../../src/include/executor/executor.h:307
    #5  advance_aggregates (aggstate=0x1134ef0, aggstate=0x1134ef0) at nodeAgg.c:679
    #6  agg_retrieve_direct (aggstate=0x1134ef0) at nodeAgg.c:1847
    #7  ExecAgg (pstate=0x1134ef0) at nodeAgg.c:1572
    #8  0x000000000063b58b in ExecProcNode (node=0x1134ef0) at ../../../src/include/executor/executor.h:239
    #9  ExecutePlan (execute_once=<optimized out>, dest=0x1144248, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1134ef0, estate=0x1134c98)
        at execMain.c:1646
    #10 standard_ExecutorRun (queryDesc=0x1094f18, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
    #11 0x00000000007a43cc in PortalRunSelect (portal=0x10da368, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:929
    #12 0x00000000007a5958 in PortalRun (portal=portal@entry=0x10da368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1144248, altdest=altdest@entry=0x1144248,
        completionTag=0x7ffcfd8f3db0 "") at pquery.c:770
    #13 0x00000000007a177e in exec_simple_query (query_string=0x10727a8 "select max(id_column) from q_tbl_archiv ;") at postgres.c:1215
    #14 0x00000000007a2f3f in PostgresMain (argc=<optimized out>, argv=argv@entry=0x109e400, dbname=<optimized out>, username=<optimized out>) at postgres.c:4236
    #15 0x00000000007237ce in BackendRun (port=0x1097c30, port=0x1097c30) at postmaster.c:4437
    #16 BackendStartup (port=0x1097c30) at postmaster.c:4128
    #17 ServerLoop () at postmaster.c:1704
    #18 0x000000000072458e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x106c350) at postmaster.c:1377
    #19 0x000000000047d101 in main (argc=3, argv=0x106c350) at main.c:228
    
      select max(id_default) from q_tbl_archiv;
    
    Program received signal SIGABRT, Aborted.
    0x00007db3185a1e35 in raise () from /lib64/libc.so.6
    (gdb) bt
    #0  0x00007db3185a1e35 in raise () from /lib64/libc.so.6
    #1  0x00007db31858c895 in abort () from /lib64/libc.so.6
    #2  0x00000000008b4470 in ExceptionalCondition (conditionName=conditionName@entry=0xabe49e "1", errorType=errorType@entry=0x907128 "unrecognized TOAST vartag", fileName=fileName@entry=0xa4965b "execTuples.c",
        lineNumber=lineNumber@entry=971) at assert.c:54
    #3  0x00000000006466d3 in slot_deform_heap_tuple (natts=6, offp=0x1135170, tuple=<optimized out>, slot=0x1135128) at execTuples.c:985
    #4  tts_buffer_heap_getsomeattrs (slot=0x1135128, natts=<optimized out>) at execTuples.c:676
    #5  0x00000000006489fc in slot_getsomeattrs_int (slot=slot@entry=0x1135128, attnum=6) at execTuples.c:1877
    #6  0x00000000006379a3 in slot_getsomeattrs (attnum=<optimized out>, slot=0x1135128) at ../../../src/include/executor/tuptable.h:345
    #7  ExecInterpExpr (state=0x11364b0, econtext=0x1134cd8, isnull=<optimized out>) at execExprInterp.c:441
    #8  0x000000000064f699 in ExecEvalExprSwitchContext (isNull=0x7ffcfd8f3b2f, econtext=<optimized out>, state=<optimized out>) at ../../../src/include/executor/executor.h:307
    #9  advance_aggregates (aggstate=0x1134aa0, aggstate=0x1134aa0) at nodeAgg.c:679
    #10 agg_retrieve_direct (aggstate=0x1134aa0) at nodeAgg.c:1847
    #11 ExecAgg (pstate=0x1134aa0) at nodeAgg.c:1572
    #12 0x000000000063b58b in ExecProcNode (node=0x1134aa0) at ../../../src/include/executor/executor.h:239
    #13 ExecutePlan (execute_once=<optimized out>, dest=0x11439d8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1134aa0, estate=0x1134848)
        at execMain.c:1646
    #14 standard_ExecutorRun (queryDesc=0x1094f18, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
    #15 0x00000000007a43cc in PortalRunSelect (portal=0x10da368, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:929
    #16 0x00000000007a5958 in PortalRun (portal=portal@entry=0x10da368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x11439d8, altdest=altdest@entry=0x11439d8,
        completionTag=0x7ffcfd8f3db0 "") at pquery.c:770
    #17 0x00000000007a177e in exec_simple_query (query_string=0x10727a8 "select max(id_default) from q_tbl_archiv ;") at postgres.c:1215
    #18 0x00000000007a2f3f in PostgresMain (argc=<optimized out>, argv=argv@entry=0x109e4f0, dbname=<optimized out>, username=<optimized out>) at postgres.c:4236
    #19 0x00000000007237ce in BackendRun (port=0x10976f0, port=0x10976f0) at postmaster.c:4437
    #20 BackendStartup (port=0x10976f0) at postmaster.c:4128
    #21 ServerLoop () at postmaster.c:1704
    #22 0x000000000072458e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x106c350) at postmaster.c:1377
    #23 0x000000000047d101 in main (argc=3, argv=0x106c350) at main.c:228
    
    
    It's quite puzzling, though. If I had to guess, I'd say it's some sort
    of memory management issue (either we're corrupting it somehow, or
    perhaps using it after pfree).
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  6. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T14:28:27Z

    On Wed, Oct 09, 2019 at 10:07:01AM -0400, Tom Lane wrote:
    >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >> FWIW I can reproduce this - it's enough to do this on the 11 cluster
    >
    >I failed to reproduce any problem from your example, but I was trying
    >in C locale on a Linux machine.  What environment are you testing?
    >
    >			regards, tom lane
    
    test=# show lc_collate ;
     lc_collate 
    ------------
     C.UTF-8
    (1 row)
    
    
    I can reproduce this pretty easily like this:
    
    1) build 11
    
    git checkout REL_11_STABLE
    ./configure --prefix=/home/user/pg-11 --enable-debug --enable-cassert && make -s clean && make -s -j4 install
    
    2) build 12
    
    git checkout REL_12_STABLE
    ./configure --prefix=/home/user/pg-12 --enable-debug --enable-cassert && make -s clean && make -s -j4 install
    
    3) create the 11 cluster
    
    /home/user/pg-11/bin/pg_ctl -D /tmp/data-11 init
    /home/user/pg-11/bin/pg_ctl -D /tmp/data-11 -l /tmp/pg-11.log start
    /home/user/pg-11/bin/createdb test
    /home/user/pg-11/bin/psql test
    
    4) create the table
    
        create table q_tbl_archiv as
        with
        qseason as (
        select table_name,column_name, ordinal_position
        ,replace(column_name,'_season','') as col_qualifier
        -- ,'id_'||replace(column_name,'_season','') as id_column
        from information_schema.columns
        order by table_name
        )
        select qs.*,c.column_name as id_column, c.column_default as id_default
        from
                qseason qs
                left join information_schema.columns c on c.table_name=qs.table_name and
        c.column_name like 'id_%';
    
    5) shutdown the 11 cluster
    
      /home/user/pg-11/bin/pg_ctl -D /tmp/data-11 stop
    
    6) init 12 cluster
    
      /home/user/pg-12/bin/pg_ctl -D /tmp/data-12 init
    
    7) do the pg_upgrade thing
    
      /home/user/pg-12/bin/pg_upgrade -b /home/user/pg-11/bin -B /home/user/pg-12/bin -d /tmp/data-11 -D /tmp/data-12 -k
    
    8) start 12 cluster
    
      /home/user/pg-12/bin/pg_ctl -D /tmp/data-12 -l /tmp/pg-12.log start
    
    9) kabooom
    
      /home/user/pg-12/bin/psql test -c "analyze q_tbl_archiv"
    
    
    On my system (Fedora 30 in x86_64) this reliably results a crash (and
    various other crashes as demonstrated in my previous message).
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  7. AW: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Hans Buschmann <buschmann@nidsa.net> — 2019-10-09T14:50:52Z

    Hi Tomas,
    
    Nice that you could reproduce it.
    This was just the way I followed.
    
    For your Info, here are my no-standard config params:
    
                    name                |         current_setting         
    ------------------------------------+---------------------------------
     application_name                   | psql                            
     auto_explain.log_analyze           | on                              
     auto_explain.log_min_duration      | 0                               
     auto_explain.log_nested_statements | on                              
     client_encoding                    | WIN1252                         
     cluster_name                       | HB_DEV                          
     data_checksums                     | on                              
     DateStyle                          | ISO, DMY                        
     default_text_search_config         | pg_catalog.german               
     dynamic_shared_memory_type         | windows                         
     effective_cache_size               | 8GB                             
     lc_collate                         | C                               
     lc_ctype                           | German_Germany.1252             
     lc_messages                        | C                               
     lc_monetary                        | German_Germany.1252             
     lc_numeric                         | German_Germany.1252             
     lc_time                            | German_Germany.1252             
     log_destination                    | stderr                          
     log_directory                      | N:/ZZ_log/pg_log_hbdev          
     log_error_verbosity                | verbose                         
     log_file_mode                      | 0640                            
     log_line_prefix                    | WHB %a %t %i %e %2l:>           
     log_statement                      | mod                             
     log_temp_files                     | 0                               
     log_timezone                       | CET                             
     logging_collector                  | on                              
     maintenance_work_mem               | 128MB                           
     max_connections                    | 100                             
     max_stack_depth                    | 2MB                             
     max_wal_size                       | 1GB                             
     min_wal_size                       | 80MB                            
     pg_stat_statements.max             | 5000                            
     pg_stat_statements.track           | all                             
     random_page_cost                   | 1                               
     search_path                        | public, archiv, ablage, admin   
     server_encoding                    | UTF8                            
     server_version                     | 12.0                            
     shared_buffers                     | 1GB                             
     shared_preload_libraries           | auto_explain,pg_stat_statements 
     temp_buffers                       | 32MB                            
     TimeZone                           | CET                             
     transaction_deferrable             | off                             
     transaction_isolation              | read committed                  
     transaction_read_only              | off                             
     update_process_title               | off                             
     wal_buffers                        | 16MB                            
     wal_segment_size                   | 16MB                            
     work_mem                           | 32MB                            
    (48 rows)
    
    Indeed, the database has UTF8 Encoding.
    
    The Extended error-log (i have set auto_explain):
    
    
    
    WHB psql 2019-10-09 15:45:03 CEST  XX000  7:> ERROR:  XX000: invalid memory alloc request size 18446744073709551613
    WHB psql 2019-10-09 15:45:03 CEST  XX000  8:> LOCATION:  palloc, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\utils\mmgr\mcxt.c:934
    WHB psql 2019-10-09 15:45:03 CEST  XX000  9:> STATEMENT:  select * from q_tbl_archiv;
    WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  1:> LOG:  00000: duration: 0.022 ms  plan:
    	Query Text: SELECT pg_catalog.set_config('search_path', '', false);
    	Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1)
    WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  2:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
    WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  3:> LOG:  00000: duration: 0.072 ms  plan:
    	Query Text: SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;
    	Sort  (cost=1.16..1.16 rows=1 width=64) (actual time=0.063..0.064 rows=14 loops=1)
    	  Sort Key: datname
    	  Sort Method: quicksort  Memory: 26kB
    	  ->  Seq Scan on pg_database  (cost=0.00..1.15 rows=1 width=64) (actual time=0.018..0.022 rows=14 loops=1)
    	        Filter: datallowconn
    	        Rows Removed by Filter: 1
    WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  4:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
    WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  1:> LOG:  00000: duration: 0.027 ms  plan:
    	Query Text: SELECT pg_catalog.set_config('search_path', '', false);
    	Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.012..0.013 rows=1 loops=1)
    WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  2:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
    WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  3:> LOG:  00000: duration: 1.036 ms  plan:
    	Query Text: SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
    	 JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid
    	 LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid
    	 WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
    	 ORDER BY c.relpages DESC;
    	Sort  (cost=56.56..56.59 rows=13 width=132) (actual time=0.843..0.854 rows=320 loops=1)
    	  Sort Key: c.relpages DESC
    	  Sort Method: quicksort  Memory: 110kB
    	  ->  Hash Join  (cost=1.23..56.32 rows=13 width=132) (actual time=0.082..0.649 rows=320 loops=1)
    	        Hash Cond: (c.relnamespace = ns.oid)
    	        ->  Seq Scan on pg_class c  (cost=0.00..55.05 rows=13 width=76) (actual time=0.034..0.545 rows=320 loops=1)
    	              Filter: ((relkind)::text = ANY ('{r,m}'::text[]))
    	              Rows Removed by Filter: 950
    	        ->  Hash  (cost=1.10..1.10 rows=10 width=68) (actual time=0.022..0.022 rows=10 loops=1)
    	              Buckets: 1024  Batches: 1  Memory Usage: 9kB
    	              ->  Seq Scan on pg_namespace ns  (cost=0.00..1.10 rows=10 width=68) (actual time=0.010..0.011 rows=10 loops=1)
    WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  4:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
    WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  5:> LOG:  00000: duration: 0.011 ms  plan:
    	Query Text: SELECT pg_catalog.set_config('search_path', '', false);
    	Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.008..0.008 rows=1 loops=1)
    WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  6:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
    WHB  2019-10-09 15:47:01 CEST  00000 22:> LOG:  00000: server process (PID 4708) was terminated by exception 0xC0000005
    WHB  2019-10-09 15:47:01 CEST  00000 23:> DETAIL:  Failed process was running: ANALYZE admin.q_tbl_archiv;
    WHB  2019-10-09 15:47:01 CEST  00000 24:> HINT:  See C include file "ntstatus.h" for a description of the hexadecimal value.
    WHB  2019-10-09 15:47:01 CEST  00000 25:> LOCATION:  LogChildExit, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3670
    WHB  2019-10-09 15:47:01 CEST  00000 26:> LOG:  00000: terminating any other active server processes
    WHB  2019-10-09 15:47:01 CEST  00000 27:> LOCATION:  HandleChildCrash, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3400
    WHB psql 2019-10-09 15:47:01 CEST  57P02 10:> WARNING:  57P02: terminating connection because of crash of another server process
    WHB psql 2019-10-09 15:47:01 CEST  57P02 11:> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
    WHB psql 2019-10-09 15:47:01 CEST  57P02 12:> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
    WHB psql 2019-10-09 15:47:01 CEST  57P02 13:> LOCATION:  quickdie, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\tcop\postgres.c:2717
    WHB  2019-10-09 15:47:02 CEST  57P02  3:> WARNING:  57P02: terminating connection because of crash of another server process
    WHB  2019-10-09 15:47:02 CEST  57P02  4:> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
    WHB  2019-10-09 15:47:02 CEST  57P02  5:> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
    WHB  2019-10-09 15:47:02 CEST  57P02  6:> LOCATION:  quickdie, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\tcop\postgres.c:2717
    WHB  2019-10-09 15:47:02 CEST  00000 28:> LOG:  00000: all server processes terminated; reinitializing
    WHB  2019-10-09 15:47:02 CEST  00000 29:> LOCATION:  PostmasterStateMachine, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3912
    WHB  2019-10-09 15:47:02 CEST  00000  1:> LOG:  00000: database system was interrupted; last known up at 2019-10-09 15:46:03 CEST
    WHB  2019-10-09 15:47:02 CEST  00000  2:> LOCATION:  StartupXLOG, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\access\transam\xlog.c:6277
    
    
    The table was imported successively by pg_dump/pg_restore from the previous versions into pg11.
    
    This was the same what I did on the other machine (pg 11.5). On this test machine I could successfully Export the table with pg_dump -t.
    
    On the erroneous PG12 Cluster I succeeded to recreate a similar  table with the original create table Statements: no Errors.
    
    Under PG12 upgraded, I tried to select only the first column (select table_name from q_tbl_archiv) and got erroneaus results (shown first 2 entries):
    
    cpsdb=# select table_name from q_tbl_archiv;
                     table_name
    ---------------------------------------------
     \x11chemmat\x17chm_season
     !collectionsheet\x15cs_season
    
    It seems that the length Bytes are present in the Output.
    
    Hope this Information helps.
    
    Hans Buschmann
    
    
    
    
    
  8. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T23:07:59Z

    Well, I think I found the root cause. It's because of 7c15cef86d, which
    changed the definition of sql_identifier so that it's a domain over name
    instead of varchar. So we now have this:
    
      SELECT typname, typlen FROM pg_type WHERE typname = 'sql_identifier':
    
      -[ RECORD 1 ]--+---------------
      typname        | sql_identifier
      typlen         | -1
    
    instead of this
    
      -[ RECORD 1 ]--+---------------
      typname        | sql_identifier
      typlen         | 64
    
    Unfortunately, that seems very much like a break of on-disk format, and
    after pg_upgrade any table containing sql_identifier columns is pretty
    much guaranteed to be badly mangled. For example, the first row from the
    table used in the original report looks like this on PostgreSQL 11:
    
      test=# select ctid, * from q_tbl_archiv limit 1;
      -[ RECORD 1 ]----+--------------------------
      ctid             | (0,1)
      table_name       | _pg_foreign_data_wrappers
      column_name      | foreign_data_wrapper_name
      ordinal_position | 5
      col_qualifier    | foreign_data_wrapper_name
      id_column        | 
      id_default       | 
    
    while on PostgreSQL 12 after pg_upgrade it looks like this
    
      test=# select ctid, table_name, column_name, ordinal_position from q_tbl_archiv limit 1;:
      -[ RECORD 1 ]----+---------------------------------------------------------
      ctid             | (0,1)
      table_name       | 5_pg_foreign_data_wrappers5foreign_data_wrapper_name\x05
      column_name      | _data_wrapper_name
      ordinal_position | 0
    
    Not sure what to do about this :-(
    
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
    
    
    
    
  9. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-09T23:18:45Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > Well, I think I found the root cause. It's because of 7c15cef86d, which
    > changed the definition of sql_identifier so that it's a domain over name
    > instead of varchar.
    
    Ah...
    
    > Not sure what to do about this :-(
    
    Fortunately, there should be close to zero people with user tables
    depending on sql_identifier.  I think we should just add a test in
    pg_upgrade that refuses to upgrade if there are any such columns.
    It won't be the first such restriction.
    
    			regards, tom lane
    
    
    
    
  10. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T23:28:36Z

    On Wed, Oct 09, 2019 at 07:18:45PM -0400, Tom Lane wrote:
    >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >> Well, I think I found the root cause. It's because of 7c15cef86d, which
    >> changed the definition of sql_identifier so that it's a domain over name
    >> instead of varchar.
    >
    >Ah...
    >
    >> Not sure what to do about this :-(
    >
    >Fortunately, there should be close to zero people with user tables
    >depending on sql_identifier.  I think we should just add a test in
    >pg_upgrade that refuses to upgrade if there are any such columns.
    >It won't be the first such restriction.
    >
    
    Hmmm, yeah.  I agree the number of people using sql_identifier in user
    tables is low, but OTOH we got this report within a week after release,
    so maybe it's higher than we think.
    
    Another option would be to teach pg_upgrade to switch the columns to
    'text' or 'varchar', not sure if that's possible or how much work would
    that be.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
    
    
    
    
  11. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-09T23:41:54Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > On Wed, Oct 09, 2019 at 07:18:45PM -0400, Tom Lane wrote:
    >> Fortunately, there should be close to zero people with user tables
    >> depending on sql_identifier.  I think we should just add a test in
    >> pg_upgrade that refuses to upgrade if there are any such columns.
    >> It won't be the first such restriction.
    
    > Hmmm, yeah.  I agree the number of people using sql_identifier in user
    > tables is low, but OTOH we got this report within a week after release,
    > so maybe it's higher than we think.
    
    True.
    
    > Another option would be to teach pg_upgrade to switch the columns to
    > 'text' or 'varchar', not sure if that's possible or how much work would
    > that be.
    
    I think it'd be a mess --- the actual hacking would have to happen in
    pg_dump, I think, and it'd be a kluge because pg_dump doesn't normally
    understand what server version its output is going to.  So we'd more
    or less have to control it through a new pg_dump switch that pg_upgrade
    would use.  Ick.
    
    Also, even if we did try to silently convert such columns that way,
    I bet we'd get other bug reports about "why'd my columns suddenly
    change type?".  So I'd rather force the user to be involved.
    
    			regards, tom lane
    
    
    
    
  12. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Andres Freund <andres@anarazel.de> — 2019-10-10T01:48:13Z

    On 2019-10-09 19:41:54 -0400, Tom Lane wrote:
    > Also, even if we did try to silently convert such columns that way,
    > I bet we'd get other bug reports about "why'd my columns suddenly
    > change type?".  So I'd rather force the user to be involved.
    
    +1
    
    
    
    
  13. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-10T13:43:03Z

    On Wed, Oct 09, 2019 at 06:48:13PM -0700, Andres Freund wrote:
    >On 2019-10-09 19:41:54 -0400, Tom Lane wrote:
    >> Also, even if we did try to silently convert such columns that way,
    >> I bet we'd get other bug reports about "why'd my columns suddenly
    >> change type?".  So I'd rather force the user to be involved.
    >
    >+1
    
    Fair enough, attached is a patch doing that, I think. Maybe the file
    should be named differently, as it contains other objects than just
    tables.
    
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
  14. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-10T14:19:12Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > Fair enough, attached is a patch doing that, I think. Maybe the file
    > should be named differently, as it contains other objects than just
    > tables.
    
    Seems about right, though I notice it will not detect domains over
    sql_identifier.  How much do we care about that?
    
    To identify such domains, I think we'd need something like
    WHERE attypid IN (recursive-WITH-query), which makes me nervous.
    We did support those starting with 8.4, which is as far back as
    pg_upgrade will go, so in theory it should work.  But I think we
    had bugs with such cases in old releases.  Do we want to assume
    that the source server has been updated enough to avoid any such
    bugs?  The expense of such a query might be daunting, too.
    
    			regards, tom lane
    
    
    
    
  15. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-10T16:33:50Z

    On Thu, Oct 10, 2019 at 10:19:12AM -0400, Tom Lane wrote:
    >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >> Fair enough, attached is a patch doing that, I think. Maybe the file
    >> should be named differently, as it contains other objects than just
    >> tables.
    >
    >Seems about right, though I notice it will not detect domains over
    >sql_identifier.  How much do we care about that?
    >
    >To identify such domains, I think we'd need something like
    >WHERE attypid IN (recursive-WITH-query), which makes me nervous.
    >We did support those starting with 8.4, which is as far back as
    >pg_upgrade will go, so in theory it should work.  But I think we
    >had bugs with such cases in old releases.  Do we want to assume
    >that the source server has been updated enough to avoid any such
    >bugs?  The expense of such a query might be daunting, too.
    >
    
    Not sure.
    
    Regarding bugs, I think it's fine to assume the users are running
    sufficiently recent version - they may not, but then they're probably
    subject to various other bugs (data corruption, queries). If they're
    not, then they'll either get false positives (in which case they'll be
    forced to update) or false negatives (which is just as if we did
    nothing).
    
    For the query cost, I think we can assume the domain hierarchies are not
    particularly deep (in practice I'd expect just domains directly on the
    sql_identifier type). And I doubt people are using that very widely,
    it's probably more like this report - ad-hoc CTAS, so just a couple of
    items. So I wouldn't expect it to be a huge deal in most cases. But even
    if it takes a second or two, it's a one-time cost.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  16. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-10T20:14:20Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > On Thu, Oct 10, 2019 at 10:19:12AM -0400, Tom Lane wrote:
    >> To identify such domains, I think we'd need something like
    >> WHERE attypid IN (recursive-WITH-query), which makes me nervous.
    >> We did support those starting with 8.4, which is as far back as
    >> pg_upgrade will go, so in theory it should work.  But I think we
    >> had bugs with such cases in old releases.  Do we want to assume
    >> that the source server has been updated enough to avoid any such
    >> bugs?  The expense of such a query might be daunting, too.
    
    > For the query cost, I think we can assume the domain hierarchies are not
    > particularly deep (in practice I'd expect just domains directly on the
    > sql_identifier type). And I doubt people are using that very widely,
    > it's probably more like this report - ad-hoc CTAS, so just a couple of
    > items. So I wouldn't expect it to be a huge deal in most cases. But even
    > if it takes a second or two, it's a one-time cost.
    
    What I was worried about was the planner possibly trying to apply the
    atttypid restriction as a scan qual using a subplan, which might be rather
    awful.  But it doesn't look like that happens.  I get a hash semijoin to
    the CTE output, in all branches back to 8.4, on this trial query:
    
    explain
    with recursive sqlidoids(toid) as (
    select 'information_schema.sql_identifier'::pg_catalog.regtype as toid
    union
    select oid from pg_catalog.pg_type, sqlidoids
      where typtype = 'd' and typbasetype = sqlidoids.toid
    )                       
    SELECT n.nspname, c.relname, a.attname 
    FROM    pg_catalog.pg_class c, 
            pg_catalog.pg_namespace n, 
            pg_catalog.pg_attribute a 
    WHERE   c.oid = a.attrelid AND 
            NOT a.attisdropped AND 
            a.atttypid in (select toid from sqlidoids) AND
            c.relkind IN ('r','v','i') and
            c.relnamespace = n.oid AND 
            n.nspname !~ '^pg_temp_' AND 
            n.nspname !~ '^pg_toast_temp_' AND 
            n.nspname NOT IN ('pg_catalog', 'information_schema');
    
    			regards, tom lane
    
    
    
    
  17. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-10T20:40:22Z

    On Thu, Oct 10, 2019 at 04:14:20PM -0400, Tom Lane wrote:
    >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >> On Thu, Oct 10, 2019 at 10:19:12AM -0400, Tom Lane wrote:
    >>> To identify such domains, I think we'd need something like
    >>> WHERE attypid IN (recursive-WITH-query), which makes me nervous.
    >>> We did support those starting with 8.4, which is as far back as
    >>> pg_upgrade will go, so in theory it should work.  But I think we
    >>> had bugs with such cases in old releases.  Do we want to assume
    >>> that the source server has been updated enough to avoid any such
    >>> bugs?  The expense of such a query might be daunting, too.
    >
    >> For the query cost, I think we can assume the domain hierarchies are not
    >> particularly deep (in practice I'd expect just domains directly on the
    >> sql_identifier type). And I doubt people are using that very widely,
    >> it's probably more like this report - ad-hoc CTAS, so just a couple of
    >> items. So I wouldn't expect it to be a huge deal in most cases. But even
    >> if it takes a second or two, it's a one-time cost.
    >
    >What I was worried about was the planner possibly trying to apply the
    >atttypid restriction as a scan qual using a subplan, which might be rather
    >awful.  But it doesn't look like that happens. 
    
    OK.
    
    > I get a hash semijoin to
    >the CTE output, in all branches back to 8.4, on this trial query:
    >
    >explain
    >with recursive sqlidoids(toid) as (
    >select 'information_schema.sql_identifier'::pg_catalog.regtype as toid
    >union
    >select oid from pg_catalog.pg_type, sqlidoids
    >  where typtype = 'd' and typbasetype = sqlidoids.toid
    >)
    >SELECT n.nspname, c.relname, a.attname
    >FROM    pg_catalog.pg_class c,
    >        pg_catalog.pg_namespace n,
    >        pg_catalog.pg_attribute a
    >WHERE   c.oid = a.attrelid AND
    >        NOT a.attisdropped AND
    >        a.atttypid in (select toid from sqlidoids) AND
    >        c.relkind IN ('r','v','i') and
    >        c.relnamespace = n.oid AND
    >        n.nspname !~ '^pg_temp_' AND
    >        n.nspname !~ '^pg_toast_temp_' AND
    >        n.nspname NOT IN ('pg_catalog', 'information_schema');
    >
    
    I think that's not quite sufficient - the problem is that we can have
    domains and composite types on sql_identifier, in some arbitrary order.
    And the recursive CTE won't handle that the way it's written - it will
    miss domains on composite types containing sql_identifier. And we have
    quite a few of them in the information schema, so maybe someone created
    a domain on one of those (however unlikely it may seem).
    
    I think this recursive CTE does it correctly:
    
    WITH RECURSIVE oids AS (
        -- type itself
        SELECT 'information_schema.sql_identifier'::regtype AS oid
        UNION ALL
        SELECT * FROM (
            -- domains on the type
            WITH x AS (SELECT oid FROM oids)
            SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd'
            UNION
            -- composite types containing the type
            SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x
            WHERE t.typtype = 'c' AND
                  t.oid = c.reltype AND
                  c.oid = a.attrelid AND
                  a.atttypid = x.oid
        ) foo
    ) 
    
    I had to use CTE within CTE, because the 'oids' can be referenced only
    once, but we have two subqueries there. Maybe there's a better solution.
    
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
    
    
    
    
  18. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-13T00:10:32Z

    OK,
    
    here is an updated patch, with the recursive CTE. I've done a fair
    amount of testing on it on older versions (up to 9.4), and it seems to
    work just fine.
    
    Another thing that I noticed is that the query does not need to look at
    RELKIND_COMPOSITE_TYPE, because we only really care about cases when
    sql_identifier is stored on-disk. Composite type alone does not do that,
    and the CTE includes OIDs of composite types that we then check against
    relations and matviews.
    
    Barring objections, I'll push this early next week.
    
    
    BTW the query (including the RELKIND_COMPSITE_TYPE) was essentially just
    a lightly-massaged copy of old_9_6_check_for_unknown_data_type_usage, so
    that seems wrong too. The comment explicitly says:
    
     * Also check composite types, in case they are used for table columns.
    
    but even a simple "create type c as (a unknown, b int)" without any
    table using it enough to trigger the failure. But maybe it's
    intentional, not sure.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
    
  19. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-13T18:26:48Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > here is an updated patch, with the recursive CTE. I've done a fair
    > amount of testing on it on older versions (up to 9.4), and it seems to
    > work just fine.
    
    Might be a good idea to exclude attisdropped columns in the part of the
    recursive query that's looking for sql_identifier columns of composite
    types.  I'm not sure if composites can have dropped columns today,
    but even if they can't it seems like a wise bit of future-proofing.
    (We'll no doubt have occasion to use this logic again...)
    
    Looks good other than that nit.
    
    > BTW the query (including the RELKIND_COMPSITE_TYPE) was essentially just
    > a lightly-massaged copy of old_9_6_check_for_unknown_data_type_usage, so
    > that seems wrong too.
    
    Yeah, we should back-port this logic into that check too, IMO.
    
    			regards, tom lane
    
    
    
    
  20. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-13T20:38:48Z

    On Sun, Oct 13, 2019 at 02:26:48PM -0400, Tom Lane wrote:
    >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >> here is an updated patch, with the recursive CTE. I've done a fair
    >> amount of testing on it on older versions (up to 9.4), and it seems to
    >> work just fine.
    >
    >Might be a good idea to exclude attisdropped columns in the part of the
    >recursive query that's looking for sql_identifier columns of composite
    >types.  I'm not sure if composites can have dropped columns today,
    >but even if they can't it seems like a wise bit of future-proofing.
    >(We'll no doubt have occasion to use this logic again...)
    >
    
    Hmm? How could that be safe? Let's say we have a composite type with a
    sql_identifier column, it's used in a table with data, and we drop the
    column. We need the pg_type information to parse the existing, so how
    could we skip attisdropped columns?
    
    >Looks good other than that nit.
    >
    >> BTW the query (including the RELKIND_COMPSITE_TYPE) was essentially just
    >> a lightly-massaged copy of old_9_6_check_for_unknown_data_type_usage, so
    >> that seems wrong too.
    >
    >Yeah, we should back-port this logic into that check too, IMO.
    >
    
    You mean the recursive CTE, removal of RELKIND_COMPOSITE_TYPE or the
    proposed change w.r.t. dropped columns?
    
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
    
    
    
    
  21. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-14T14:16:40Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > On Sun, Oct 13, 2019 at 02:26:48PM -0400, Tom Lane wrote:
    >> Might be a good idea to exclude attisdropped columns in the part of the
    >> recursive query that's looking for sql_identifier columns of composite
    >> types.  I'm not sure if composites can have dropped columns today,
    
    [ I checked this, they can ]
    
    >> but even if they can't it seems like a wise bit of future-proofing.
    >> (We'll no doubt have occasion to use this logic again...)
    
    > Hmm? How could that be safe? Let's say we have a composite type with a
    > sql_identifier column, it's used in a table with data, and we drop the
    > column. We need the pg_type information to parse the existing, so how
    > could we skip attisdropped columns?
    
    It works exactly like it does for table rowtypes.
    
    regression=# create type cfoo as (f1 int, f2 int, f3 int);
    CREATE TYPE
    regression=# alter type cfoo drop attribute f2;
    ALTER TYPE
    regression=# select attname,atttypid,attisdropped,attlen,attalign from pg_attribute where attrelid = 'cfoo'::regclass;
               attname            | atttypid | attisdropped | attlen | attalign 
    ------------------------------+----------+--------------+--------+----------
     f1                           |       23 | f            |      4 | i
     ........pg.dropped.2........ |        0 | t            |      4 | i
     f3                           |       23 | f            |      4 | i
    (3 rows)
    
    All we need to skip over the dead data is attlen/attalign, which are
    preserved in pg_attribute even if the pg_type row is gone.
    
    As this example shows, you don't really *have* to check attisdropped
    because atttypid will be set to zero.  But the latter is just a
    defense measure in case somebody forgets to check attisdropped;
    you're not supposed to forget that.
    
    			regards, tom lane
    
    
    
    
  22. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-14T16:35:38Z

    On Mon, Oct 14, 2019 at 10:16:40AM -0400, Tom Lane wrote:
    >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >> On Sun, Oct 13, 2019 at 02:26:48PM -0400, Tom Lane wrote:
    >>> Might be a good idea to exclude attisdropped columns in the part of the
    >>> recursive query that's looking for sql_identifier columns of composite
    >>> types.  I'm not sure if composites can have dropped columns today,
    >
    >[ I checked this, they can ]
    >
    >>> but even if they can't it seems like a wise bit of future-proofing.
    >>> (We'll no doubt have occasion to use this logic again...)
    >
    >> Hmm? How could that be safe? Let's say we have a composite type with a
    >> sql_identifier column, it's used in a table with data, and we drop the
    >> column. We need the pg_type information to parse the existing, so how
    >> could we skip attisdropped columns?
    >
    >It works exactly like it does for table rowtypes.
    >
    >regression=# create type cfoo as (f1 int, f2 int, f3 int);
    >CREATE TYPE
    >regression=# alter type cfoo drop attribute f2;
    >ALTER TYPE
    >regression=# select attname,atttypid,attisdropped,attlen,attalign from pg_attribute where attrelid = 'cfoo'::regclass;
    >           attname            | atttypid | attisdropped | attlen | attalign
    >------------------------------+----------+--------------+--------+----------
    > f1                           |       23 | f            |      4 | i
    > ........pg.dropped.2........ |        0 | t            |      4 | i
    > f3                           |       23 | f            |      4 | i
    >(3 rows)
    >
    >All we need to skip over the dead data is attlen/attalign, which are
    >preserved in pg_attribute even if the pg_type row is gone.
    >
    >As this example shows, you don't really *have* to check attisdropped
    >because atttypid will be set to zero.  But the latter is just a
    >defense measure in case somebody forgets to check attisdropped;
    >you're not supposed to forget that.
    >
    
    Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for
    clarifying, I'll polish and push the fix shortly.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  23. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-15T00:18:17Z

    On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote:
    > ...
    >
    >Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for
    >clarifying, I'll polish and push the fix shortly.
    >
    
    I've pushed and backpatched the fix. Attached are similar fixes for the
    existing pg_upgrade checks for pg_catalog.line and pg_catalog.unknown
    types, which have the same issues with composite types and domains.
    
    There are some additional details & examples in the commit messages.
    
    I've kept this in two patches primarily because of backpatching - the
    line fix should go back up to 9.4, the unknown is for 10.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
  24. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Justin Pryzby <pryzby@telsasoft.com> — 2019-10-15T04:41:18Z

    On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote:
    > On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote:
    > >...
    > >
    > >Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for
    > >clarifying, I'll polish and push the fix shortly.
    
    Perhaps it'd be worth creating a test for on-disk format ?
    
    Like a table with a column for each core type, which is either SELECTed from
    after pg_upgrade, or pg_dump output compared before and after.
    
    Justin
    
    
    
    
  25. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-15T07:07:25Z

    On Mon, Oct 14, 2019 at 11:41:18PM -0500, Justin Pryzby wrote:
    >On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote:
    >> On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote:
    >> >...
    >> >
    >> >Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for
    >> >clarifying, I'll polish and push the fix shortly.
    >
    >Perhaps it'd be worth creating a test for on-disk format ?
    >
    >Like a table with a column for each core type, which is either SELECTed from
    >after pg_upgrade, or pg_dump output compared before and after.
    >
    
    IMO that would be useful - we now have a couple of these checks for
    different data types (line, unknown, sql_identifier), with a couple of
    combinations each. And I've been looking if we do similar pg_upgrade
    tests, but I haven't found anything. I mean, we do pg_upgrade the
    cluster used for regression tests, but here we need to test a number of
    cases that are meant to abort the pg_upgrade. So we'd need a number of
    pg_upgrade runs, to test that.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  26. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-16T11:33:44Z

    On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote:
    >On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote:
    >>...
    >>
    >>Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for
    >>clarifying, I'll polish and push the fix shortly.
    >>
    >
    >I've pushed and backpatched the fix. Attached are similar fixes for the
    >existing pg_upgrade checks for pg_catalog.line and pg_catalog.unknown
    >types, which have the same issues with composite types and domains.
    >
    >There are some additional details & examples in the commit messages.
    >
    >I've kept this in two patches primarily because of backpatching - the
    >line fix should go back up to 9.4, the unknown is for 10.
    >
    
    I've just committed and pushed both fixes after some minor corrections.
    
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  27. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-16T12:27:57Z

    Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    > I've just committed and pushed both fixes after some minor corrections.
    
    Not quite right in 9.6 and before, according to crake.  Looks like
    some issue with the CppAsString2'd constants?  Did we even have
    CppAsString2 that far back?
    
    			regards, tom lane
    
    
    
    
  28. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-16T13:26:42Z

    I wrote:
    > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >> I've just committed and pushed both fixes after some minor corrections.
    
    > Not quite right in 9.6 and before, according to crake.  Looks like
    > some issue with the CppAsString2'd constants?  Did we even have
    > CppAsString2 that far back?
    
    Yeah, we did.  On closer inspection I suspect that we need to #include
    some other file to get the RELKIND_ constants in the old branches.
    
    			regards, tom lane
    
    
    
    
  29. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-16T13:41:17Z

    On Wed, Oct 16, 2019 at 03:26:42PM +0200, Tom Lane wrote:
    >I wrote:
    >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >>> I've just committed and pushed both fixes after some minor corrections.
    >
    >> Not quite right in 9.6 and before, according to crake.  Looks like
    >> some issue with the CppAsString2'd constants?  Did we even have
    >> CppAsString2 that far back?
    >
    >Yeah, we did.  On closer inspection I suspect that we need to #include
    >some other file to get the RELKIND_ constants in the old branches.
    >
    
    Oh! Looking.
    
    regards
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  30. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-16T14:33:43Z

    On Wed, Oct 16, 2019 at 03:26:42PM +0200, Tom Lane wrote:
    >I wrote:
    >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
    >>> I've just committed and pushed both fixes after some minor corrections.
    >
    >> Not quite right in 9.6 and before, according to crake.  Looks like
    >> some issue with the CppAsString2'd constants?  Did we even have
    >> CppAsString2 that far back?
    >
    >Yeah, we did.  On closer inspection I suspect that we need to #include
    >some other file to get the RELKIND_ constants in the old branches.
    >
    
    Yeah, the pg_class.h catalog was missing on pre-10 relases. It compiled
    just fine, so I haven't noticed that during the backpatching :-(
    
    Fixed, let's see if the buildfarm is happy with that.
    
    regads
    
    -- 
    Tomas Vondra                  http://www.2ndQuadrant.com
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    
    
    
    
  31. Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

    Bruce Momjian <bruce@momjian.us> — 2019-10-23T22:08:21Z

    On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote:
    > On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote:
    > > ...
    > > 
    > > Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for
    > > clarifying, I'll polish and push the fix shortly.
    > > 
    > 
    > I've pushed and backpatched the fix. Attached are similar fixes for the
    > existing pg_upgrade checks for pg_catalog.line and pg_catalog.unknown
    > types, which have the same issues with composite types and domains.
    
    This comit added old_11_check_for_sql_identifier_data_type_usage(), but
    it did not use the clearer database error list format added to the
    master branch in commit 1634d36157.  Attached is a patch to fix this,
    which I have committed.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        http://momjian.us
      EnterpriseDB                             http://enterprisedb.com
    
    + As you are, so once was I.  As I am, so you will be. +
    +                      Ancient Roman grave inscription +
    
  32. pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2020-12-06T18:02:48Z

    I'm finally returning to this 14 month old thread:
    (was: Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12)
    
    On Tue, Oct 15, 2019 at 09:07:25AM +0200, Tomas Vondra wrote:
    > On Mon, Oct 14, 2019 at 11:41:18PM -0500, Justin Pryzby wrote:
    > > 
    > > Perhaps it'd be worth creating a test for on-disk format ?
    > > 
    > > Like a table with a column for each core type, which is either SELECTed from
    > > after pg_upgrade, or pg_dump output compared before and after.
    > 
    > IMO that would be useful - we now have a couple of these checks for
    > different data types (line, unknown, sql_identifier), with a couple of
    > combinations each. And I've been looking if we do similar pg_upgrade
    > tests, but I haven't found anything. I mean, we do pg_upgrade the
    > cluster used for regression tests, but here we need to test a number of
    > cases that are meant to abort the pg_upgrade. So we'd need a number of
    > pg_upgrade runs, to test that.
    
    I meant to notice if the binary format is accidentally changed again, which was
    what happened here:
    7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.
    
    I added a table to the regression tests so it's processed by pg_upgrade tests,
    run like:
    
    | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin
    
    I checked that if I cherry-pick 0002 to v11, and comment out
    old_11_check_for_sql_identifier_data_type_usage(), then pg_upgrade/test.sh
    detects the original problem:
    pg_dump: error: Error message from server: ERROR:  invalid memory alloc request size 18446744073709551613
    
    I understand the buildfarm has its own cross-version-upgrade test, which I
    think would catch this on its own.
    
    These all seem to complicate use of pg_upgrade/test.sh, so 0001 is needed to
    allow testing upgrade from older releases.
    
    e78900afd217fa3eaa77c51e23a94c1466af421c Create by default sql/ and expected/ for output directory in pg_regress
    40b132c1afbb4b1494aa8e48cc35ec98d2b90777 In the pg_upgrade test suite, don't write to src/test/regress.
    fc49e24fa69a15efacd5b8958115ed9c43c48f9a Make WAL segment size configurable at initdb time.
    c37b3d08ca6873f9d4eaf24c72a90a550970cbb8 Allow group access on PGDATA
    da9b580d89903fee871cf54845ffa2b26bda2e11 Refactor dir/file permissions
    
    -- 
    Justin
    
  33. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2020-12-16T17:22:23Z

    On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
    > I meant to notice if the binary format is accidentally changed again, which was
    > what happened here:
    > 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.
    > 
    > I added a table to the regression tests so it's processed by pg_upgrade tests,
    > run like:
    > 
    > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin
    
    Per cfbot, this avoids testing ::xml (support for which may not be enabled)
    And also now tests oid types.
    
    I think the per-version hacks should be grouped by logical change, rather than
    by version.  Which I've started doing here.
    
    -- 
    Justin
    
  34. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2020-12-27T19:07:29Z

    On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote:
    > On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
    > > I meant to notice if the binary format is accidentally changed again, which was
    > > what happened here:
    > > 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.
    > > 
    > > I added a table to the regression tests so it's processed by pg_upgrade tests,
    > > run like:
    > > 
    > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin
    > 
    > Per cfbot, this avoids testing ::xml (support for which may not be enabled)
    > And also now tests oid types.
    > 
    > I think the per-version hacks should be grouped by logical change, rather than
    > by version.  Which I've started doing here.
    
    rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f
    
    -- 
    Justin
    
  35. Re: pg_upgrade test for binary compatibility of core data types

    Peter Eisentraut <peter.eisentraut@enterprisedb.com> — 2021-01-11T14:28:08Z

    On 2020-12-27 20:07, Justin Pryzby wrote:
    > On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote:
    >> On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
    >>> I meant to notice if the binary format is accidentally changed again, which was
    >>> what happened here:
    >>> 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.
    >>>
    >>> I added a table to the regression tests so it's processed by pg_upgrade tests,
    >>> run like:
    >>>
    >>> | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin
    >>
    >> Per cfbot, this avoids testing ::xml (support for which may not be enabled)
    >> And also now tests oid types.
    >>
    >> I think the per-version hacks should be grouped by logical change, rather than
    >> by version.  Which I've started doing here.
    > 
    > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f
    
    I think these patches could use some in-place documentation of what they 
    are trying to achieve and how they do it.  The required information is 
    spread over a lengthy thread.  No one wants to read that.  Add commit 
    messages to the patches.
    
    
    
    
  36. Re: pg_upgrade test for binary compatibility of core data types

    Bruce Momjian <bruce@momjian.us> — 2021-01-11T15:21:36Z

    On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote:
    > On 2020-12-27 20:07, Justin Pryzby wrote:
    > > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f
    > 
    > I think these patches could use some in-place documentation of what they are
    > trying to achieve and how they do it.  The required information is spread
    > over a lengthy thread.  No one wants to read that.  Add commit messages to
    > the patches.
    
    Oh, I see that now, and agree that you need to explain each item with a
    comment.  pg_upgrade is doing some odd things, so documenting everything
    it does is a big win.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EnterpriseDB                             https://enterprisedb.com
    
      The usefulness of a cup is in its emptiness, Bruce Lee
    
    
    
    
    
  37. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2021-01-12T04:13:52Z

    On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote:
    > On 2020-12-27 20:07, Justin Pryzby wrote:
    > > On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote:
    > > > On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
    > > > > I meant to notice if the binary format is accidentally changed again, which was
    > > > > what happened here:
    > > > > 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.
    > > > > 
    > > > > I added a table to the regression tests so it's processed by pg_upgrade tests,
    > > > > run like:
    > > > > 
    > > > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin
    > > > 
    > > > Per cfbot, this avoids testing ::xml (support for which may not be enabled)
    > > > And also now tests oid types.
    > > > 
    > > > I think the per-version hacks should be grouped by logical change, rather than
    > > > by version.  Which I've started doing here.
    > > 
    > > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f
    > 
    > I think these patches could use some in-place documentation of what they are
    > trying to achieve and how they do it.  The required information is spread
    > over a lengthy thread.  No one wants to read that.  Add commit messages to
    > the patches.
    
    0001 patch fixes pg_upgrade/test.sh, which was disfunctional.
    Portions of the first patch were independently handled by commits 52202bb39,
    fa744697c, 091866724.  So this is rebased on those.
    I guess updating this script should be a part of a beta-checklist somewhere,
    since I guess nobody will want to backpatch changes for testing older releases.
    
    0002 allows detecting the information_schema problem that was introduced at:
    7c15cef86 Base information_schema.sql_identifier domain on name, not varchar.
    
    +-- Create a table with different data types, to exercise binary compatibility
    +-- during pg_upgrade test
    
    If binary compatibility is changed I expect this will error, crash, at least
    return wrong data, and thereby fail tests.
    
    -- 
    Justin
    
    On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote:
    > I checked that if I cherry-pick 0002 to v11, and comment out
    > old_11_check_for_sql_identifier_data_type_usage(), then pg_upgrade/test.sh
    > detects the original problem:
    > pg_dump: error: Error message from server: ERROR:  invalid memory alloc request size 18446744073709551613
    > 
    > I understand the buildfarm has its own cross-version-upgrade test, which I
    > think would catch this on its own.
    > 
    > These all seem to complicate use of pg_upgrade/test.sh, so 0001 is needed to
    > allow testing upgrade from older releases.
    > 
    > e78900afd217fa3eaa77c51e23a94c1466af421c Create by default sql/ and expected/ for output directory in pg_regress
    > 40b132c1afbb4b1494aa8e48cc35ec98d2b90777 In the pg_upgrade test suite, don't write to src/test/regress.
    > fc49e24fa69a15efacd5b8958115ed9c43c48f9a Make WAL segment size configurable at initdb time.
    > c37b3d08ca6873f9d4eaf24c72a90a550970cbb8 Allow group access on PGDATA
    > da9b580d89903fee871cf54845ffa2b26bda2e11 Refactor dir/file permissions
    
    
  38. Re: pg_upgrade test for binary compatibility of core data types

    Bruce Momjian <bruce@momjian.us> — 2021-01-12T17:15:59Z

    On Mon, Jan 11, 2021 at 10:13:52PM -0600, Justin Pryzby wrote:
    > On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote:
    > > I think these patches could use some in-place documentation of what they are
    > > trying to achieve and how they do it.  The required information is spread
    > > over a lengthy thread.  No one wants to read that.  Add commit messages to
    > > the patches.
    > 
    > 0001 patch fixes pg_upgrade/test.sh, which was disfunctional.
    > Portions of the first patch were independently handled by commits 52202bb39,
    > fa744697c, 091866724.  So this is rebased on those.
    > I guess updating this script should be a part of a beta-checklist somewhere,
    > since I guess nobody will want to backpatch changes for testing older releases.
    
    Uh, what exactly is missing from the beta checklist?  I read the patch
    and commit message but don't understand it.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EnterpriseDB                             https://enterprisedb.com
    
      The usefulness of a cup is in its emptiness, Bruce Lee
    
    
    
    
    
  39. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2021-01-12T17:27:53Z

    On Tue, Jan 12, 2021 at 12:15:59PM -0500, Bruce Momjian wrote:
    > On Mon, Jan 11, 2021 at 10:13:52PM -0600, Justin Pryzby wrote:
    > > On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote:
    > > > I think these patches could use some in-place documentation of what they are
    > > > trying to achieve and how they do it.  The required information is spread
    > > > over a lengthy thread.  No one wants to read that.  Add commit messages to
    > > > the patches.
    > > 
    > > 0001 patch fixes pg_upgrade/test.sh, which was disfunctional.
    > > Portions of the first patch were independently handled by commits 52202bb39,
    > > fa744697c, 091866724.  So this is rebased on those.
    > > I guess updating this script should be a part of a beta-checklist somewhere,
    > > since I guess nobody will want to backpatch changes for testing older releases.
    > 
    > Uh, what exactly is missing from the beta checklist?  I read the patch
    > and commit message but don't understand it.
    
    Did you try to use test.sh to upgrade from a prior release ?
    
    Evidently it's frequently forgotten, as evidenced by all the "deferred
    maintenance" I had to do to allow testing the main patch (currently 0003).
    
    See also:
    
    commit 5bab1985dfc25eecf4b098145789955c0b246160
    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Date:   Thu Jun 8 13:48:27 2017 -0400
    
        Fix bit-rot in pg_upgrade's test.sh, and improve documentation.
        
        Doing a cross-version upgrade test with test.sh evidently hasn't been
        tested since circa 9.2, because the script lacked case branches for
        old-version servers newer than 9.1.  Future-proof that a bit, and
        clean up breakage induced by our recent drop of V0 function call
        protocol (namely that oldstyle_length() isn't in the regression
        suite anymore).
    
    -- 
    Justin
    
    
    
    
  40. Re: pg_upgrade test for binary compatibility of core data types

    Bruce Momjian <bruce@momjian.us> — 2021-01-12T17:53:56Z

    On Tue, Jan 12, 2021 at 11:27:53AM -0600, Justin Pryzby wrote:
    > On Tue, Jan 12, 2021 at 12:15:59PM -0500, Bruce Momjian wrote:
    > > Uh, what exactly is missing from the beta checklist?  I read the patch
    > > and commit message but don't understand it.
    > 
    > Did you try to use test.sh to upgrade from a prior release ?
    > 
    > Evidently it's frequently forgotten, as evidenced by all the "deferred
    > maintenance" I had to do to allow testing the main patch (currently 0003).
    > 
    > See also:
    > 
    > commit 5bab1985dfc25eecf4b098145789955c0b246160
    > Author: Tom Lane <tgl@sss.pgh.pa.us>
    > Date:   Thu Jun 8 13:48:27 2017 -0400
    > 
    >     Fix bit-rot in pg_upgrade's test.sh, and improve documentation.
    >     
    >     Doing a cross-version upgrade test with test.sh evidently hasn't been
    >     tested since circa 9.2, because the script lacked case branches for
    >     old-version servers newer than 9.1.  Future-proof that a bit, and
    >     clean up breakage induced by our recent drop of V0 function call
    >     protocol (namely that oldstyle_length() isn't in the regression
    >     suite anymore).
    
    Oh, that is odd.  I thought that was regularly run.  I have my own test
    infrastructure that I run for every major release so I never have run
    the built-in one, except for make check-world.
    
    -- 
      Bruce Momjian  <bruce@momjian.us>        https://momjian.us
      EnterpriseDB                             https://enterprisedb.com
    
      The usefulness of a cup is in its emptiness, Bruce Lee
    
    
    
    
    
  41. Re: pg_upgrade test for binary compatibility of core data types

    Andrew Dunstan <andrew@dunslane.net> — 2021-01-12T21:44:28Z

    On 1/12/21 12:53 PM, Bruce Momjian wrote:
    > On Tue, Jan 12, 2021 at 11:27:53AM -0600, Justin Pryzby wrote:
    >> On Tue, Jan 12, 2021 at 12:15:59PM -0500, Bruce Momjian wrote:
    >>> Uh, what exactly is missing from the beta checklist?  I read the patch
    >>> and commit message but don't understand it.
    >> Did you try to use test.sh to upgrade from a prior release ?
    >>
    >> Evidently it's frequently forgotten, as evidenced by all the "deferred
    >> maintenance" I had to do to allow testing the main patch (currently 0003).
    >>
    >> See also:
    >>
    >> commit 5bab1985dfc25eecf4b098145789955c0b246160
    >> Author: Tom Lane <tgl@sss.pgh.pa.us>
    >> Date:   Thu Jun 8 13:48:27 2017 -0400
    >>
    >>     Fix bit-rot in pg_upgrade's test.sh, and improve documentation.
    >>     
    >>     Doing a cross-version upgrade test with test.sh evidently hasn't been
    >>     tested since circa 9.2, because the script lacked case branches for
    >>     old-version servers newer than 9.1.  Future-proof that a bit, and
    >>     clean up breakage induced by our recent drop of V0 function call
    >>     protocol (namely that oldstyle_length() isn't in the regression
    >>     suite anymore).
    > Oh, that is odd.  I thought that was regularly run.  I have my own test
    > infrastructure that I run for every major release so I never have run
    > the built-in one, except for make check-world.
    >
    
    Cross version pg_upgrade is tested regularly in the buildfarm, but not
    using test.sh. Instead it uses the saved data repository from a previous
    run of the buildfarm client for the source branch, and tries to upgrade
    that to the target branch.
    
    
    cheers
    
    
    andrew
    
    
    
    --
    Andrew Dunstan
    EDB: https://www.enterprisedb.com
    
    
    
    
    
  42. Re: pg_upgrade test for binary compatibility of core data types

    Peter Eisentraut <peter.eisentraut@enterprisedb.com> — 2021-01-15T08:00:31Z

    On 2021-01-12 22:44, Andrew Dunstan wrote:
    > Cross version pg_upgrade is tested regularly in the buildfarm, but not
    > using test.sh. Instead it uses the saved data repository from a previous
    > run of the buildfarm client for the source branch, and tries to upgrade
    > that to the target branch.
    
    Does it maintain a set of fixups similar to what is in test.sh?  Are 
    those two sets the same?
    
    
    
    
  43. Re: pg_upgrade test for binary compatibility of core data types

    Tom Lane <tgl@sss.pgh.pa.us> — 2021-03-06T20:01:43Z

    Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
    > On 2021-01-12 22:44, Andrew Dunstan wrote:
    >> Cross version pg_upgrade is tested regularly in the buildfarm, but not
    >> using test.sh. Instead it uses the saved data repository from a previous
    >> run of the buildfarm client for the source branch, and tries to upgrade
    >> that to the target branch.
    
    > Does it maintain a set of fixups similar to what is in test.sh?  Are 
    > those two sets the same?
    
    Responding to Peter: the first answer is yes, the second is I didn't
    check, but certainly Justin's patch makes them closer.
    
    I spent some time poking through this set of patches.  I agree that
    there's problem(s) here that we need to solve, but it feels like this
    isn't a great way to solve them.  What I see in the patchset is:
    
    v4-0001 mostly teaches test.sh about specific changes that have to be
    made to historic versions of the regression database to allow them
    to be reloaded into current servers.  As already discussed, this is
    really duplicative of knowledge that's been embedded into the buildfarm
    client over time.  It'd be better if we could refactor that so that
    the buildfarm shares a common database of these actions with test.sh.
    And said database ought to be in our git tree, so committers could
    fix problems without having to get Andrew involved every time.
    I think this could be represented as a psql script, at least in
    versions that have psql \if (but that came in in v10, so maybe
    we're there already).
    
    (Taking a step back, maybe the regression database isn't an ideal
    testbed for this in the first place.  But it does have the advantage of
    not being a narrow-minded test that is going to miss things we haven't
    explicitly thought of.)
    
    v4-0002 is a bunch of random changes that mostly seem to revert hacky
    adjustments previously made to improve test coverage.  I don't really
    agree with any of these, nor see why they're necessary.  If they
    are necessary then we need to restore the coverage somewhere else.
    Admittedly, the previous changes were a bit hacky, but deleting them
    (without even bothering to adjust the relevant comments) isn't the
    answer.
    
    v4-0003 is really the heart of the matter: it adds a table with some
    previously-not-covered datatypes plus a query that purports to make sure
    that we are covering all types of interest.  But I'm not sure I believe
    that query.  It's got hard-wired assumptions about which typtype values
    need to be covered.  Why is it okay to exclude range and multirange?
    Are we sure that all composites are okay to exclude?  Likewise, the
    restriction to pg_catalog and information_schema schemas seems likely to
    bite us someday.  There are some very random exclusions based on name
    patterns, which seem unsafe (let's list the specific type OIDs), and
    again the nearby comments don't match the code.  But the biggest issue
    is that this can only cover core datatypes, not any contrib stuff.
    
    I don't know what we could do about contrib types.  Maybe we should
    figure that covering core types is already a step forward, and be
    happy with getting that done.
    
    			regards, tom lane
    
    
    
    
  44. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2021-04-30T18:33:48Z

    On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote:
    > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
    > > On 2021-01-12 22:44, Andrew Dunstan wrote:
    > >> Cross version pg_upgrade is tested regularly in the buildfarm, but not
    > >> using test.sh. Instead it uses the saved data repository from a previous
    > >> run of the buildfarm client for the source branch, and tries to upgrade
    > >> that to the target branch.
    > 
    > > Does it maintain a set of fixups similar to what is in test.sh?  Are 
    > > those two sets the same?
    > 
    > Responding to Peter: the first answer is yes, the second is I didn't
    > check, but certainly Justin's patch makes them closer.
    
    Right - I had meant to send this.
    
    https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm
    
                    $opsql = 'drop operator if exists public.=> (bigint, NONE)';
    ..
                    my $missing_funcs = q{drop function if exists public.boxarea(box);
                                  drop function if exists public.funny_dup17();
    ..
                    my $prstmt = join(';',
                            'drop operator if exists #@# (bigint,NONE)',
                            'drop operator if exists #%# (bigint,NONE)',
                            'drop operator if exists !=- (bigint,NONE)',
    ..
                            $prstmt = join(';',
                                    'drop operator @#@ (NONE, bigint)',
    ..
                                    'drop aggregate if exists public.array_cat_accum(anyarray)',
    
    > I spent some time poking through this set of patches.  I agree that
    > there's problem(s) here that we need to solve, but it feels like this
    > isn't a great way to solve them.  What I see in the patchset is:
    
    For starters, is there a "release beta checklist" ?
    Testing test.sh should be on it.
    So should fuzz testing.
    
    > v4-0001 mostly teaches test.sh about specific changes that have to be
    > made to historic versions of the regression database to allow them
    > to be reloaded into current servers.  As already discussed, this is
    > really duplicative of knowledge that's been embedded into the buildfarm
    > client over time.  It'd be better if we could refactor that so that
    > the buildfarm shares a common database of these actions with test.sh.
    > And said database ought to be in our git tree, so committers could
    > fix problems without having to get Andrew involved every time.
    > I think this could be represented as a psql script, at least in
    > versions that have psql \if (but that came in in v10, so maybe
    > we're there already).
    
    I started this.  I don't know if it's compatible with the buildfarm client, but
    I think any issues maybe can be avoided by using "IF EXISTS".
    
    > v4-0002 is a bunch of random changes that mostly seem to revert hacky
    > adjustments previously made to improve test coverage.  I don't really
    > agree with any of these, nor see why they're necessary.  If they
    > are necessary then we need to restore the coverage somewhere else.
    > Admittedly, the previous changes were a bit hacky, but deleting them
    > (without even bothering to adjust the relevant comments) isn't the
    > answer.
    
    It was necessary to avoid --wal-segsize and -g to allow testing upgrades from
    versions which don't support those options.  I think test.sh should be portable
    back to all supported versions.
    
    When those options were added, it broke test.sh upgrading from old versions.
    I changed this to a shell conditional for the "new" features:
    | "$1" -N -A trust ${oldsrc:+--wal-segsize 1 -g}
    Ideally it would check the version.
    
    > v4-0003 is really the heart of the matter: it adds a table with some
    > previously-not-covered datatypes plus a query that purports to make sure
    > that we are covering all types of interest.
    
    Actually the 'manytypes' table intends to include *all* core datatypes itself,
    not just those that aren't included somewhere else.  I think "included
    somewhere else" depends on the order of the regression these, and type_sanity
    runs early, so the table might need to include many types that are created
    later, to avoid "false positives" in the associated test.
    
    > But I'm not sure I believe
    > that query.  It's got hard-wired assumptions about which typtype values
    > need to be covered.  Why is it okay to exclude range and multirange?
    > Are we sure that all composites are okay to exclude?  Likewise, the
    > restriction to pg_catalog and information_schema schemas seems likely to
    > bite us someday.  There are some very random exclusions based on name
    > patterns, which seem unsafe (let's list the specific type OIDs), and
    > again the nearby comments don't match the code.  But the biggest issue
    > is that this can only cover core datatypes, not any contrib stuff.
    
    I changed to use regtype/OIDs, included range/multirange and stopped including
    only pg_catalog/information_schema.  But didn't yet handle composites.
    
    > I don't know what we could do about contrib types.  Maybe we should
    > figure that covering core types is already a step forward, and be
    > happy with getting that done.
    
    Right .. this is meant to at least handle the lowest hanging fruit.
    
    -- 
    Justin
    
  45. Re: pg_upgrade test for binary compatibility of core data types

    Jacob Champion <pchampion@vmware.com> — 2021-07-16T16:21:07Z

    On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote:
    > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote:
    > > v4-0001 mostly teaches test.sh about specific changes that have to be
    > > made to historic versions of the regression database to allow them
    > > to be reloaded into current servers.  As already discussed, this is
    > > really duplicative of knowledge that's been embedded into the buildfarm
    > > client over time.  It'd be better if we could refactor that so that
    > > the buildfarm shares a common database of these actions with test.sh.
    > > And said database ought to be in our git tree, so committers could
    > > fix problems without having to get Andrew involved every time.
    > > I think this could be represented as a psql script, at least in
    > > versions that have psql \if (but that came in in v10, so maybe
    > > we're there already).
    > 
    > I started this.  I don't know if it's compatible with the buildfarm client, but
    > I think any issues maybe can be avoided by using "IF EXISTS".
    
    I'm going to try pulling this into a psql script today and see how far
    I get.
    
    > > But I'm not sure I believe
    > > that query.  It's got hard-wired assumptions about which typtype values
    > > need to be covered.  Why is it okay to exclude range and multirange?
    > > Are we sure that all composites are okay to exclude?  Likewise, the
    > > restriction to pg_catalog and information_schema schemas seems likely to
    > > bite us someday.  There are some very random exclusions based on name
    > > patterns, which seem unsafe (let's list the specific type OIDs), and
    > > again the nearby comments don't match the code.  But the biggest issue
    > > is that this can only cover core datatypes, not any contrib stuff.
    > 
    > I changed to use regtype/OIDs, included range/multirange and stopped including
    > only pg_catalog/information_schema.  But didn't yet handle composites.
    
    Per cfbot, this test needs to be taught about the new
    pg_brin_bloom_summary and pg_brin_minmax_multi_summary types.
    
    --Jacob
    
  46. Re: pg_upgrade test for binary compatibility of core data types

    Jacob Champion <pchampion@vmware.com> — 2021-07-16T16:40:34Z

    On Fri, 2021-07-16 at 16:21 +0000, Jacob Champion wrote:
    > On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote:
    > > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote:
    > > > v4-0001 mostly teaches test.sh about specific changes that have to be
    > > > made to historic versions of the regression database to allow them
    > > > to be reloaded into current servers.  As already discussed, this is
    > > > really duplicative of knowledge that's been embedded into the buildfarm
    > > > client over time.  It'd be better if we could refactor that so that
    > > > the buildfarm shares a common database of these actions with test.sh.
    > > > And said database ought to be in our git tree, so committers could
    > > > fix problems without having to get Andrew involved every time.
    > > > I think this could be represented as a psql script, at least in
    > > > versions that have psql \if (but that came in in v10, so maybe
    > > > we're there already).
    > > 
    > > I started this.  I don't know if it's compatible with the buildfarm client, but
    > > I think any issues maybe can be avoided by using "IF EXISTS".
    > 
    > I'm going to try pulling this into a psql script today and see how far
    > I get.
    
    I completely misread this exchange -- you already did this in 0004.
    Sorry for the noise.
    
    --Jacob
    
  47. Re: pg_upgrade test for binary compatibility of core data types

    Jacob Champion <pchampion@vmware.com> — 2021-07-16T18:02:18Z

    On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote:
    > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote:
    > > v4-0001 mostly teaches test.sh about specific changes that have to be
    > > made to historic versions of the regression database to allow them
    > > to be reloaded into current servers.  As already discussed, this is
    > > really duplicative of knowledge that's been embedded into the buildfarm
    > > client over time.  It'd be better if we could refactor that so that
    > > the buildfarm shares a common database of these actions with test.sh.
    > > And said database ought to be in our git tree, so committers could
    > > fix problems without having to get Andrew involved every time.
    > > I think this could be represented as a psql script, at least in
    > > versions that have psql \if (but that came in in v10, so maybe
    > > we're there already).
    > 
    > I started this.  I don't know if it's compatible with the buildfarm client, but
    > I think any issues maybe can be avoided by using "IF EXISTS".
    
    Here are the differences I see on a first pass (without putting too
    much thought into how significant the differences are). Buildfarm code
    I'm comparing against is at [1].
    
    - Both versions drop @#@ and array_cat_accum, but the buildfarm
    additionally replaces them with a new operator and aggregate,
    respectively.
    
    - The buildfarm's dropping of table OIDs is probably more resilient,
    since it loops over pg_class looking for relhasoids.
    
    - The buildfarm handles (or drops) several contrib databases in
    addition to the core regression DB.
    
    - The psql script drops the first_el_agg_any aggregate and a `TRANSFORM
    FOR integer`; I don't see any corresponding code in the buildfarm.
    
    - Some version ranges are different between the two. For example,
    abstime_/reltime_/tinterval_tbl are dropped by the buildfarm if the old
    version is < 9.3, while the psql script drops them for old versions <=
    10.
    
    - The buildfarm drops the public.=> operator for much older versions of
    Postgres. I assume we don't need that here.
    
    - The buildfarm adjusts pg_proc for the location of regress.so; I see
    there's a commented placeholder for this at the end of the psql script
    but it's not yet implemented.
    
    As an aside, I think the "fromv10" naming scheme for the "old version
    <= 10" condition is unintuitive. If the old version is e.g. 9.6, we're
    not upgrading "from 10".
    
    --Jacob
    
    [1] https://github.com/PGBuildFarm/client-code/blob/main/PGBuild/Modules/TestUpgradeXversion.pm
    
  48. Re: pg_upgrade test for binary compatibility of core data types

    Tom Lane <tgl@sss.pgh.pa.us> — 2021-09-11T18:19:04Z

    Jacob Champion <pchampion@vmware.com> writes:
    > On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote:
    >> I started this.  I don't know if it's compatible with the buildfarm client, but
    >> I think any issues maybe can be avoided by using "IF EXISTS".
    
    > Here are the differences I see on a first pass (without putting too
    > much thought into how significant the differences are). Buildfarm code
    > I'm comparing against is at [1].
    
    I switched the CF entry for this to "Waiting on Author".  It's
    been failing in the cfbot for a couple of months, and Jacob's
    provided some review-ish comments here, so I think there's
    plenty of reason to deem the ball to be in Justin's court.
    
    			regards, tom lane
    
    
    
    
  49. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2021-09-12T00:51:16Z

    On Fri, Jul 16, 2021 at 06:02:18PM +0000, Jacob Champion wrote:
    > On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote:
    > > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote:
    > > > v4-0001 mostly teaches test.sh about specific changes that have to be
    > > > made to historic versions of the regression database to allow them
    > > > to be reloaded into current servers.  As already discussed, this is
    > > > really duplicative of knowledge that's been embedded into the buildfarm
    > > > client over time.  It'd be better if we could refactor that so that
    > > > the buildfarm shares a common database of these actions with test.sh.
    > > > And said database ought to be in our git tree, so committers could
    > > > fix problems without having to get Andrew involved every time.
    > > > I think this could be represented as a psql script, at least in
    > > > versions that have psql \if (but that came in in v10, so maybe
    > > > we're there already).
    > > 
    > > I started this.  I don't know if it's compatible with the buildfarm client, but
    > > I think any issues maybe can be avoided by using "IF EXISTS".
    > 
    > Here are the differences I see on a first pass (without putting too
    > much thought into how significant the differences are). Buildfarm code
    > I'm comparing against is at [1].
    > 
    > - Both versions drop @#@ and array_cat_accum, but the buildfarm
    > additionally replaces them with a new operator and aggregate,
    > respectively.
    > 
    > - The buildfarm's dropping of table OIDs is probably more resilient,
    > since it loops over pg_class looking for relhasoids.
    
    These are all "translated" from test.sh, so follow its logic.
    Maybe it should be improved, but that's separate from this patch - which is
    already doing a few unrelated things.
    
    > - The buildfarm adjusts pg_proc for the location of regress.so; I see
    > there's a commented placeholder for this at the end of the psql script
    > but it's not yet implemented.
    
    I didn't understand why this was done here, but it turns out it has to be done
    *after* calling pg_dump.  So it has to stay where it is.
    
    > - Some version ranges are different between the two. For example,
    > abstime_/reltime_/tinterval_tbl are dropped by the buildfarm if the old
    > version is < 9.3, while the psql script drops them for old versions <=
    > 10.
    
    This was an error.  Thanks.
    
    > - The buildfarm drops the public.=> operator for much older versions of
    > Postgres. I assume we don't need that here.
    
    > As an aside, I think the "fromv10" naming scheme for the "old version
    > <= 10" condition is unintuitive. If the old version is e.g. 9.6, we're
    > not upgrading "from 10".
    
    I renamed the version vars - feel free to suggest something better.
    
    I'll solicit suggestions what else to do to progresss these.
    
    @Andrew: did you have any comment on this part ?
    
    |Subject: buildfarm xversion diff
    |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com
    |
    |I gave suggestion how to reduce the "lines of diff" metric almost to nothing,
    |allowing a very small "fudge factor", and which I think makes this a pretty
    |good metric rather than a passable one.
    
    -- 
    Justin
    
  50. Re: pg_upgrade test for binary compatibility of core data types

    Andrew Dunstan <andrew@dunslane.net> — 2021-09-12T18:41:23Z

    On 9/11/21 8:51 PM, Justin Pryzby wrote:
    >
    > @Andrew: did you have any comment on this part ?
    >
    > |Subject: buildfarm xversion diff
    > |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com
    > |
    > |I gave suggestion how to reduce the "lines of diff" metric almost to nothing,
    > |allowing a very small "fudge factor", and which I think makes this a pretty
    > |good metric rather than a passable one.
    >
    
    Somehow I missed that. Looks like some good suggestions. I'll
    experiment. (Note: we can't assume the presence of sed, especially on
    Windows).
    
    
    cheers
    
    
    andrew
    
    --
    Andrew Dunstan
    EDB: https://www.enterprisedb.com
    
    
    
    
    
  51. Re: pg_upgrade test for binary compatibility of core data types

    Andrew Dunstan <andrew@dunslane.net> — 2021-09-13T13:20:37Z

    On 9/12/21 2:41 PM, Andrew Dunstan wrote:
    > On 9/11/21 8:51 PM, Justin Pryzby wrote:
    >> @Andrew: did you have any comment on this part ?
    >>
    >> |Subject: buildfarm xversion diff
    >> |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com
    >> |
    >> |I gave suggestion how to reduce the "lines of diff" metric almost to nothing,
    >> |allowing a very small "fudge factor", and which I think makes this a pretty
    >> |good metric rather than a passable one.
    >>
    > Somehow I missed that. Looks like some good suggestions. I'll
    > experiment. (Note: we can't assume the presence of sed, especially on
    > Windows).
    >
    >
    
    I tried with the attached patch on crake, which tests back as far as
    9.2. Here are the diff counts from HEAD:
    
    
    andrew@emma:HEAD $ grep -c '^[+-]' dumpdiff-REL9_* dumpdiff-REL_1*
    dumpdiff-HEAD
    dumpdiff-REL9_2_STABLE:514
    dumpdiff-REL9_3_STABLE:169
    dumpdiff-REL9_4_STABLE:185
    dumpdiff-REL9_5_STABLE:221
    dumpdiff-REL9_6_STABLE:11
    dumpdiff-REL_10_STABLE:11
    dumpdiff-REL_11_STABLE:73
    dumpdiff-REL_12_STABLE:73
    dumpdiff-REL_13_STABLE:73
    dumpdiff-REL_14_STABLE:0
    dumpdiff-HEAD:0
    
    
    I've also attached those non-empty dumpdiff files for information, since
    they are quite small.
    
    
    There is still work to do, but this is promising. Next step: try it on
    Windows.
    
    
    cheers
    
    
    andrew
    
    
    -- 
    
    Andrew Dunstan
    EDB: https://www.enterprisedb.com
    
    
  52. Re: pg_upgrade test for binary compatibility of core data types

    Andrew Dunstan <andrew@dunslane.net> — 2021-09-15T19:28:54Z

    On 9/13/21 9:20 AM, Andrew Dunstan wrote:
    > On 9/12/21 2:41 PM, Andrew Dunstan wrote:
    >> On 9/11/21 8:51 PM, Justin Pryzby wrote:
    >>> @Andrew: did you have any comment on this part ?
    >>>
    >>> |Subject: buildfarm xversion diff
    >>> |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com
    >>> |
    >>> |I gave suggestion how to reduce the "lines of diff" metric almost to nothing,
    >>> |allowing a very small "fudge factor", and which I think makes this a pretty
    >>> |good metric rather than a passable one.
    >>>
    >> Somehow I missed that. Looks like some good suggestions. I'll
    >> experiment. (Note: we can't assume the presence of sed, especially on
    >> Windows).
    >>
    >>
    > I tried with the attached patch on crake, which tests back as far as
    > 9.2. Here are the diff counts from HEAD:
    >
    >
    > andrew@emma:HEAD $ grep -c '^[+-]' dumpdiff-REL9_* dumpdiff-REL_1*
    > dumpdiff-HEAD
    > dumpdiff-REL9_2_STABLE:514
    > dumpdiff-REL9_3_STABLE:169
    > dumpdiff-REL9_4_STABLE:185
    > dumpdiff-REL9_5_STABLE:221
    > dumpdiff-REL9_6_STABLE:11
    > dumpdiff-REL_10_STABLE:11
    > dumpdiff-REL_11_STABLE:73
    > dumpdiff-REL_12_STABLE:73
    > dumpdiff-REL_13_STABLE:73
    > dumpdiff-REL_14_STABLE:0
    > dumpdiff-HEAD:0
    >
    >
    > I've also attached those non-empty dumpdiff files for information, since
    > they are quite small.
    >
    >
    > There is still work to do, but this is promising. Next step: try it on
    > Windows.
    >
    >
    
    It appears to do the right thing on Windows. yay!
    
    
    We probably need to get smarter about the heuristics, though, e.g. by
    taking into account the buildfarm options and the platform. It would
    also help a lot if we could make vcregress.pl honor USE_MODULE_DB.
    That's on my TODO list, but it just got a lot higher priority.
    
    
    cheers
    
    
    andrew
    
    --
    Andrew Dunstan
    EDB: https://www.enterprisedb.com
    
    
    
    
    
  53. Re: pg_upgrade test for binary compatibility of core data types

    Andrew Dunstan <andrew@dunslane.net> — 2021-09-24T14:58:52Z

    On 9/15/21 3:28 PM, Andrew Dunstan wrote:
    > On 9/13/21 9:20 AM, Andrew Dunstan wrote:
    >> On 9/12/21 2:41 PM, Andrew Dunstan wrote:
    >>> On 9/11/21 8:51 PM, Justin Pryzby wrote:
    >>>> @Andrew: did you have any comment on this part ?
    >>>>
    >>>> |Subject: buildfarm xversion diff
    >>>> |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com
    >>>> |
    >>>> |I gave suggestion how to reduce the "lines of diff" metric almost to nothing,
    >>>> |allowing a very small "fudge factor", and which I think makes this a pretty
    >>>> |good metric rather than a passable one.
    >>>>
    >>> Somehow I missed that. Looks like some good suggestions. I'll
    >>> experiment. (Note: we can't assume the presence of sed, especially on
    >>> Windows).
    >>>
    >>>
    >> I tried with the attached patch on crake, which tests back as far as
    >> 9.2. Here are the diff counts from HEAD:
    >>
    >>
    >> andrew@emma:HEAD $ grep -c '^[+-]' dumpdiff-REL9_* dumpdiff-REL_1*
    >> dumpdiff-HEAD
    >> dumpdiff-REL9_2_STABLE:514
    >> dumpdiff-REL9_3_STABLE:169
    >> dumpdiff-REL9_4_STABLE:185
    >> dumpdiff-REL9_5_STABLE:221
    >> dumpdiff-REL9_6_STABLE:11
    >> dumpdiff-REL_10_STABLE:11
    >> dumpdiff-REL_11_STABLE:73
    >> dumpdiff-REL_12_STABLE:73
    >> dumpdiff-REL_13_STABLE:73
    >> dumpdiff-REL_14_STABLE:0
    >> dumpdiff-HEAD:0
    >>
    >>
    >> I've also attached those non-empty dumpdiff files for information, since
    >> they are quite small.
    >>
    >>
    >> There is still work to do, but this is promising. Next step: try it on
    >> Windows.
    >>
    >>
    > It appears to do the right thing on Windows. yay!
    >
    >
    > We probably need to get smarter about the heuristics, though, e.g. by
    > taking into account the buildfarm options and the platform. It would
    > also help a lot if we could make vcregress.pl honor USE_MODULE_DB.
    > That's on my TODO list, but it just got a lot higher priority.
    >
    >
    
    
    Here's what I've committed:
    <https://github.com/PGBuildFarm/client-code/commit/6317d82c0e897a29dabd57ed8159d13920401f96>
    
    
    cheers
    
    
    andrew
    
    --
    Andrew Dunstan
    EDB: https://www.enterprisedb.com
    
    
    
    
    
  54. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-10-01T07:58:41Z

    On Sat, Sep 11, 2021 at 07:51:16PM -0500, Justin Pryzby wrote:
    > These are all "translated" from test.sh, so follow its logic.
    > Maybe it should be improved, but that's separate from this patch - which is
    > already doing a few unrelated things.
    
    I was looking at this CF entry, and what you are doing in 0004 to move
    the tweaks from pg_upgrade's test.sh to a separate SQL script that
    uses psql's meta-commands like \if to check which version we are on is
    really interesting.  The patch does not apply anymore, so this needs a
    rebase.  The entry has been switched as waiting on author by Tom, but
    you did not update it after sending the new versions in [1].  I am
    wondering if we could have something cleaner than just a set booleans
    as you do here for each check, as that does not help with the
    readability of the tests.
    
    [1]: https://www.postgresql.org/message-id/20210912005116.GF26465@telsasoft.com
    --
    Michael
    
  55. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-10-11T05:38:12Z

    On Fri, Oct 01, 2021 at 04:58:41PM +0900, Michael Paquier wrote:
    > I was looking at this CF entry, and what you are doing in 0004 to move
    > the tweaks from pg_upgrade's test.sh to a separate SQL script that
    > uses psql's meta-commands like \if to check which version we are on is
    > really interesting.  The patch does not apply anymore, so this needs a
    > rebase.  The entry has been switched as waiting on author by Tom, but
    > you did not update it after sending the new versions in [1].  I am
    > wondering if we could have something cleaner than just a set booleans
    > as you do here for each check, as that does not help with the
    > readability of the tests.
    
    And so, I am back at this thread, looking at the set of patches
    proposed from 0001 to 0004.  The patches are rather messy and mix many
    things and concepts, but there are basically four things that stand
    out:
    - test.sh is completely broken when using PG >= 14 as new version
    because of the removal of the test tablespace.  Older versions of
    pg_regress don't support --make-tablespacedir so I am fine to stick a
    couple of extra mkdirs for testtablespace/, expected/ and sql/ to
    allow the script to work properly for major upgrades as a workaround,
    but only if we use an old version.  We need to do something here for
    HEAD and REL_14_STABLE.
    - The script would fail when using PG <= 11 as old version because of
    WITH OIDS relations.  We need to do something down to REL_12_STABLE.
    I did not like much the approach taken to stick 4 ALTER TABLE queries
    though (the patch was actually failing here for me), so instead I have
    borrowed what the buildfarm has been doing with a DO block.  That
    works fine, and that's more portable.
    - Not using --extra-float-digits with PG <= 11 as older version causes
    a bunch of diffs in the dumps, making the whole unreadable.  The patch
    was doing that unconditionally for *all version*, which is not good.
    We should only do that on the versions that need it, and we know the
    old version number before taking any dumps so that's easy to check.
    - The addition of --wal-segsize and --allow-group-access breaks the
    script when using PG < 10 at initdb time as these got added in 11.
    With 10 getting EOL'd next year and per the lack of complaints, I am
    not excited to do anything here and I'd rather leave this out so as we
    keep coverage for those options across *all* major versions upgraded
    from 11~.  The buildfarm has tests down to 9.2, but for devs my take
    is that this is enough for now.
    
    This is for the basics in terms of fixing test.sh and what should be
    backpatched.  In this aspect patches 0001 and 0002 were a bit
    incorrect.  I am not sure that 0003 is that interesting as designed as
    we would miss any new core types introduced.
    
    0004 is something I'd like to get done on HEAD to ease the move of the
    pg_upgrade tests to TAP, but it could be made a bit easier to read by
    not having all those oldpgversion_XX_YY flags grouped together for
    one.  So I am going to rewrite portions of it once done with the
    above.
    
    For now, attached is a patch to address the issues with test.sh that I
    am planning to backpatch.  This fixes the facility on HEAD, while
    minimizing the diffs between the dumps.  We could do more, like a
    s/PROCEDURE/FUNCTION/ but that does not make the diffs really
    unreadable either.  I have only tested that on HEAD as new version
    down to 11 as the oldest version per the business with --wal-segsize.
    This still needs tests with 12~ as new version though, which is boring
    but not complicated at all :)
    --
    Michael
    
  56. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-10-13T01:36:08Z

    On Mon, Oct 11, 2021 at 02:38:12PM +0900, Michael Paquier wrote:
    > For now, attached is a patch to address the issues with test.sh that I
    > am planning to backpatch.  This fixes the facility on HEAD, while
    > minimizing the diffs between the dumps.  We could do more, like a
    > s/PROCEDURE/FUNCTION/ but that does not make the diffs really
    > unreadable either.  I have only tested that on HEAD as new version
    > down to 11 as the oldest version per the business with --wal-segsize.
    > This still needs tests with 12~ as new version though, which is boring
    > but not complicated at all :)
    
    Okay, tested and done as of fa66b6d.
    --
    Michael
    
  57. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2021-11-07T19:22:00Z

    On Mon, Oct 11, 2021 at 02:38:12PM +0900, Michael Paquier wrote:
    > On Fri, Oct 01, 2021 at 04:58:41PM +0900, Michael Paquier wrote:
    > > I was looking at this CF entry, and what you are doing in 0004 to move
    > > the tweaks from pg_upgrade's test.sh to a separate SQL script that
    > > uses psql's meta-commands like \if to check which version we are on is
    > > really interesting.  The patch does not apply anymore, so this needs a
    > > rebase.  The entry has been switched as waiting on author by Tom, but
    > > you did not update it after sending the new versions in [1].  I am
    > > wondering if we could have something cleaner than just a set booleans
    > > as you do here for each check, as that does not help with the
    > > readability of the tests.
    > 
    > And so, I am back at this thread, looking at the set of patches
    > proposed from 0001 to 0004.  The patches are rather messy and mix many
    > things and concepts, but there are basically four things that stand
    > out:
    > - test.sh is completely broken when using PG >= 14 as new version
    > because of the removal of the test tablespace.  Older versions of
    > pg_regress don't support --make-tablespacedir so I am fine to stick a
    > couple of extra mkdirs for testtablespace/, expected/ and sql/ to
    > allow the script to work properly for major upgrades as a workaround,
    > but only if we use an old version.  We need to do something here for
    > HEAD and REL_14_STABLE.
    > - The script would fail when using PG <= 11 as old version because of
    > WITH OIDS relations.  We need to do something down to REL_12_STABLE.
    > I did not like much the approach taken to stick 4 ALTER TABLE queries
    > though (the patch was actually failing here for me), so instead I have
    > borrowed what the buildfarm has been doing with a DO block.  That
    > works fine, and that's more portable.
    > - Not using --extra-float-digits with PG <= 11 as older version causes
    > a bunch of diffs in the dumps, making the whole unreadable.  The patch
    > was doing that unconditionally for *all version*, which is not good.
    > We should only do that on the versions that need it, and we know the
    > old version number before taking any dumps so that's easy to check.
    > - The addition of --wal-segsize and --allow-group-access breaks the
    > script when using PG < 10 at initdb time as these got added in 11.
    > With 10 getting EOL'd next year and per the lack of complaints, I am
    > not excited to do anything here and I'd rather leave this out so as we
    > keep coverage for those options across *all* major versions upgraded
    > from 11~.  The buildfarm has tests down to 9.2, but for devs my take
    > is that this is enough for now.
    
    Michael handled those in fa66b6d.
    Note that the patch assumes that the "old version" being pg_upgraded has
    commit 97f73a978: "Work around cross-version-upgrade issues created by commit 9e38c2bb5."
    
    That may be good enough for test.sh, but if the kludges were moved to a .sql
    script which was also run by the buildfarm (in stead of its hardcoded kludges), then
    it might be necessary to handle the additional stuff my patch did, like:
    
    +                                       DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;"
    +                                       DROP FUNCTION boxarea(box);"
    +                                       DROP FUNCTION funny_dup17();"
    +                                       DROP TABLE abstime_tbl;"
    +                                       DROP TABLE reltime_tbl;"
    +                                       DROP TABLE tinterval_tbl;"
    +                                       DROP AGGREGATE first_el_agg_any(anyelement);"
    +                                       DROP AGGREGATE array_cat_accum(anyarray);"
    +                                       DROP OPERATOR @#@(NONE,bigint);"
    
    Or, maybe it's guaranteed that the animals all run latest version of old
    branches, in which case I think some of the BF's existing logic could be
    dropped, which would help to reconcile these two scripts:
    
                    my $missing_funcs = q{drop function if exists public.boxarea(box);                                                                                                                             
                                  drop function if exists public.funny_dup17();                                                                                                                                    
    ..                                                                                                                                                                                                             
                            $prstmt = join(';',                                                                                                                                                                    
                                    'drop operator @#@ (NONE, bigint)',                                                                                                                                            
    ..                                                                                                                                                                                                             
                                    'drop aggregate if exists public.array_cat_accum(anyarray)',                                                                                                                   
    
    > This is for the basics in terms of fixing test.sh and what should be
    > backpatched.  In this aspect patches 0001 and 0002 were a bit
    > incorrect.  I am not sure that 0003 is that interesting as designed as
    > we would miss any new core types introduced.
    
    We wouldn't miss new core types, because of the 2nd part of type_sanity which
    tests that each core type was included in the "manytypes" table.
    
    +-- And now a test on the previous test, checking that all core types are                                                                                                                                      
    +-- included in this table                                                                                                                                                                                     
    +-- XXX or some other non-catalog table processed by pg_upgrade                                                                                                                                                
    +SELECT oid, typname, typtype, typelem, typarray, typarray FROM pg_type t                                                                                                                                      
    +WHERE typtype NOT IN ('p', 'c')                                                                                                                                                                               
    +-- reg* which cannot be pg_upgraded                                                                                                                                                                           
    +AND oid != ALL(ARRAY['regproc', 'regprocedure', 'regoper', 'regoperator', 'regconfig', 'regdictionary', 'regnamespace', 'regcollation']::regtype[])                                                           
    +-- XML might be disabled at compile-time                                                                                                                                                                      
    +AND oid != ALL(ARRAY['xml', 'gtsvector', 'pg_node_tree', 'pg_ndistinct', 'pg_dependencies', 'pg_mcv_list', 'pg_brin_bloom_summary', 'pg_brin_minmax_multi_summary']::regtype[])                               
    +AND NOT EXISTS (SELECT 1 FROM pg_type u WHERE u.typarray=t.oid) -- exclude arrays                                                                                                                             
    +AND NOT EXISTS (SELECT 1 FROM pg_attribute a WHERE a.atttypid=t.oid AND a.attnum>0 AND a.attrelid='manytypes'::regclass);                                                                                     
    
    > 0004 is something I'd like to get done on HEAD to ease the move of the
    > pg_upgrade tests to TAP, but it could be made a bit easier to read by
    > not having all those oldpgversion_XX_YY flags grouped together for
    > one.  So I am going to rewrite portions of it once done with the
    > above.
    
    
    -- 
    Justin
    
  58. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-11-08T03:53:58Z

    On Sun, Nov 07, 2021 at 01:22:00PM -0600, Justin Pryzby wrote:
    > That may be good enough for test.sh, but if the kludges were moved to a .sql
    > script which was also run by the buildfarm (in stead of its hardcoded kludges), then
    > it might be necessary to handle the additional stuff my patch did, like:
    
    > [...]
    >
    > Or, maybe it's guaranteed that the animals all run latest version of old
    > branches, in which case I think some of the BF's existing logic could be
    > dropped, which would help to reconcile these two scripts:
    
    I am pretty sure that it is safe to assume that all buildfarm animals
    run the top of the stable branch they are testing, at least on the
    community side.  An advantage of moving all those SQLs to a script
    that can be process with psql thanks to the \if metacommands you have
    added is that buildfarm clients are not required to immediately update
    their code to work properly.  Considering as well that we should
    minimize the amount of duplication between all those things, I'd like
    to think that we'd better apply 0002 and consider a backpatch to allow
    the buildfarm to catch up on it.  It should at least allow to remove a
    good chunk of the object cleanup done directly by the buildfarm.
    
    >> This is for the basics in terms of fixing test.sh and what should be
    >> backpatched.  In this aspect patches 0001 and 0002 were a bit
    >> incorrect.  I am not sure that 0003 is that interesting as designed as
    >> we would miss any new core types introduced.
    > 
    > We wouldn't miss new core types, because of the 2nd part of type_sanity which
    > tests that each core type was included in the "manytypes" table.
    
    +-- XML might be disabled at compile-time
    +AND oid != ALL(ARRAY['xml', 'gtsvector', 'pg_node_tree',
    'pg_ndistinct', 'pg_dependencies', 'pg_mcv_list',
    'pg_brin_bloom_summary', 'pg_brin_minmax_multi_summary']::regtype[])
    
    I believe that this comment is incomplete, applying only to the first
    element listed in this array.  I guess that this had better document
    why those catalogs are part of the list?  Good to see that adding a
    reg* in core would immediately be noticed though, as far as I
    understand this SQL.
    --
    Michael
    
  59. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-11-17T07:01:19Z

    On Sun, Nov 07, 2021 at 01:22:00PM -0600, Justin Pryzby wrote:
    > That may be good enough for test.sh, but if the kludges were moved to a .sql
    > script which was also run by the buildfarm (in stead of its hardcoded kludges), then
    > it might be necessary to handle the additional stuff my patch did, like:
    >
    > +                                       DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;"
    > +                                       DROP FUNCTION boxarea(box);"
    > +                                       DROP FUNCTION funny_dup17();"
    
    These apply for an old version <= v10.
    
    > +                                       DROP TABLE abstime_tbl;"
    > +                                       DROP TABLE reltime_tbl;"
    > +                                       DROP TABLE tinterval_tbl;"
    
    old version <= 9.3.
    
    > +                                       DROP AGGREGATE first_el_agg_any(anyelement);"
    
    Not sure about this one.
    
    > +                                       DROP AGGREGATE array_cat_accum(anyarray);"
    > +                                       DROP OPERATOR @#@(NONE,bigint);"
    
    These are on 9.4.  It is worth noting that TestUpgradeXversion.pm
    recreates those objects.  I'd agree to close the gap completely rather
    than just moving what test.sh does to wipe out a maximum client code
    for the buildfarm.
    
    > Or, maybe it's guaranteed that the animals all run latest version of old
    > branches, in which case I think some of the BF's existing logic could be
    > dropped, which would help to reconcile these two scripts:
    
    That seems like a worthy goal to reduce the amount of duplication with
    the buildfarm code, while allowing tests from upgrades with older
    versions (the WAL segment size and group permission issue in test.sh
    had better be addressed in a better way, perhaps once the pg_upgrade
    tests are moved to TAP).  There are also things specific to contrib/
    modules with older versions, but that may be too specific for this
    exercise.
    
    +\if :oldpgversion_le84
    +DROP FUNCTION public.myfunc(integer);
    +\endif
    
    The oldest version tested by the buildfarm is 9.2, so we could ignore
    this part I guess?
    
    Andrew, what do you think about this part?  Based on my read of this
    thread, there is an agreement that this approach makes the buildfarm
    code more manageable so as committers would not need to patch the
    buildfarm code if their test fail.  I agree with this conclusion, but
    I wanted to double-check with you first.  This would need a backpatch
    down to 10 so as we could clean up a maximum of code in
    TestUpgradeXversion.pm without waiting for an extra 5 years.  Please
    note that I am fine to send a patch for the buildfarm client.
    
    > We wouldn't miss new core types, because of the 2nd part of type_sanity which
    > tests that each core type was included in the "manytypes" table.
    
    Thanks, I see your point now after a closer read.
    
    There is still a pending question for contrib modules, but I think
    that we need to think larger here with a better integration of
    contrib/ modules in the upgrade testing process.  Making that cheap
    would require running the set of regression tests on the instance
    to-be-upgraded first.  I think that one step in this direction would
    be to have unique databases for each contrib/ modules, so as there is
    no overlap with objects dropped?
    
    Having some checks with code types looks fine as a first step, so
    let's do that.  I have reviewed 0001, rewrote a couple of comments.
    All the comments from upthread seem to be covered with that.  So I'd
    like to get that applied on HEAD.  We could as well be less
    conservative and backpatch that down to 12 to follow on 7c15cef so we
    would be more careful with 15~ already (a backpatch down to 14 would
    be enough for this purpose, actually thanks to the 14->15 upgrade
    path).
    --
    Michael
    
  60. Re: pg_upgrade test for binary compatibility of core data types

    Andrew Dunstan <andrew@dunslane.net> — 2021-11-17T15:07:17Z

    On 11/17/21 02:01, Michael Paquier wrote:
    >
    > The oldest version tested by the buildfarm is 9.2, so we could ignore
    > this part I guess?
    >
    > Andrew, what do you think about this part?  Based on my read of this
    > thread, there is an agreement that this approach makes the buildfarm
    > code more manageable so as committers would not need to patch the
    > buildfarm code if their test fail.  I agree with this conclusion, but
    > I wanted to double-check with you first.  This would need a backpatch
    > down to 10 so as we could clean up a maximum of code in
    > TestUpgradeXversion.pm without waiting for an extra 5 years.  Please
    > note that I am fine to send a patch for the buildfarm client.
    >
    >
    
    In general I'm in agreement with the direction here. If we can have a
    script that applies to back branches to make them suitable for upgrade
    testing instead of embedding this in the buildfarm client, so much the
    better.
    
    
    cheers
    
    
    andrew
    
    --
    Andrew Dunstan
    EDB: https://www.enterprisedb.com
    
    
    
    
    
  61. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-11-18T04:36:50Z

    On Wed, Nov 17, 2021 at 10:07:17AM -0500, Andrew Dunstan wrote:
    > In general I'm in agreement with the direction here. If we can have a
    > script that applies to back branches to make them suitable for upgrade
    > testing instead of embedding this in the buildfarm client, so much the
    > better.
    
    Okay.  I have worked on 0001 to add the table to check after the
    binary compatibilities and applied it.  What remains on this thread is
    0002 to move all the SQL queries into a psql-able file with the set of
    \if clauses to control which query is run depending on the backend
    version.  Justin, could you send a rebased version of that with all
    the changes from the buildfarm client included?
    --
    Michael
    
  62. Re: pg_upgrade test for binary compatibility of core data types

    Justin Pryzby <pryzby@telsasoft.com> — 2021-11-18T04:47:28Z

    On Wed, Nov 17, 2021 at 04:01:19PM +0900, Michael Paquier wrote:
    > On Sun, Nov 07, 2021 at 01:22:00PM -0600, Justin Pryzby wrote:
    > > That may be good enough for test.sh, but if the kludges were moved to a .sql
    > > script which was also run by the buildfarm (in stead of its hardcoded kludges), then
    > > it might be necessary to handle the additional stuff my patch did, like:
    > >
    > > +                                       DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;"
    > > +                                       DROP FUNCTION boxarea(box);"
    > > +                                       DROP FUNCTION funny_dup17();"
    > 
    > These apply for an old version <= v10.
    > 
    > > +                                       DROP TABLE abstime_tbl;"
    > > +                                       DROP TABLE reltime_tbl;"
    > > +                                       DROP TABLE tinterval_tbl;"
    > 
    > old version <= 9.3.
    > 
    > > +                                       DROP AGGREGATE first_el_agg_any(anyelement);"
    > 
    > Not sure about this one.
    
    See 97f73a978fc1aca59c6ad765548ce0096d95a923
    
    > These are on 9.4.  It is worth noting that TestUpgradeXversion.pm
    > recreates those objects.  I'd agree to close the gap completely rather
    > than just moving what test.sh does to wipe out a maximum client code
    > for the buildfarm.
    
    >>Or, maybe it's guaranteed that the animals all run latest version of old
    >>branches, in which case I think some of the BF's existing logic could be
    >>dropped, which would help to reconcile these two scripts:
    >>
    >>                my $missing_funcs = q{drop function if exists public.boxarea(box);
    >>                              drop function if exists public.funny_dup17();
    >>..
    >>                        $prstmt = join(';',
    >>                                'drop operator @#@ (NONE, bigint)',
    >>..
    >>                                'drop aggregate if exists public.array_cat_accum(anyarray)',
    >>
    
    I'm not sure if everything the buildfarm does is needed anymore, or if any of
    it could be removed now, rather than being implemented in test.sh.
    
    boxarea, funny_dup - see also db3af9feb19f39827e916145f88fa5eca3130cb2
    https://github.com/PGBuildFarm/client-code/commit/9ca42ac1783a8cf99c73b4f7c52bd05a6024669d
    
    array_larger_accum/array_cat_accum - see also 97f73a978fc1aca59c6ad765548ce0096d95a923
    https://github.com/PGBuildFarm/client-code/commit/a55c89869f30db894ab823df472e739cee2e8c91
    
    @#@ 76f412ab310554acb970a0b73c8d1f37f35548c6 ??
    https://github.com/PGBuildFarm/client-code/commit/b3fdb743d89dc91fcea47bd9651776c503f774ff
    https://github.com/PGBuildFarm/client-code/commit/b44e9390e2d8d904ff8cabd906a2d4b5c8bd300a
    https://github.com/PGBuildFarm/client-code/commit/3844503c8fde134f7cc29b3fb147d590b6d2fcc1
    
    abstime:
    https://github.com/PGBuildFarm/client-code/commit/f027d991d197036028ffa9070f4c9193076ed5ed
    
    putenv
    https://github.com/PGBuildFarm/client-code/commit/fa86d0b1bc7a8d7b9f15b1da8b8e43f4d3a08e2b
    
  63. Re: pg_upgrade test for binary compatibility of core data types

    Tom Lane <tgl@sss.pgh.pa.us> — 2021-11-18T04:57:51Z

    Michael Paquier <michael@paquier.xyz> writes:
    > Okay.  I have worked on 0001 to add the table to check after the
    > binary compatibilities and applied it.
    
    Something funny about that on prion:
    
    https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-11-18%2001%3A55%3A38
    
    @@ -747,6 +747,8 @@
       '{(2020-01-02 03:04:05, 2021-02-03 06:07:08)}'::tstzmultirange,
       arrayrange(ARRAY[1,2], ARRAY[2,1]),
       arraymultirange(arrayrange(ARRAY[1,2], ARRAY[2,1]));
    +ERROR:  unrecognized key word: "ec2"
    +HINT:  ACL key word must be "group" or "user".
     -- Sanity check on the previous table, checking that all core types are
     -- included in this table.
     SELECT oid, typname, typtype, typelem, typarray, typarray
    
    Not sure what's going on there.
    
    			regards, tom lane
    
    
    
    
  64. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-11-18T05:49:35Z

    On Wed, Nov 17, 2021 at 11:57:51PM -0500, Tom Lane wrote:
    > Something funny about that on prion:
    > 
    > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-11-18%2001%3A55%3A38
    > Not sure what's going on there.
    
    Yes, that was just some missing quoting in the aclitem of this new
    table.  prion uses a specific user name, "ec2-user", that caused the
    failure.
    --
    Michael
    
  65. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-11-18T06:58:18Z

    On Wed, Nov 17, 2021 at 10:47:28PM -0600, Justin Pryzby wrote:
    > I'm not sure if everything the buildfarm does is needed anymore, or if any of
    > it could be removed now, rather than being implemented in test.sh.
    
    +-- This file has a bunch of kludges needed for testing upgrades
    across major versions
    +-- It supports testing the most recent version of an old release (not
    any arbitrary minor version).
    
    This could be better-worded.  Here is an idea:
    --
    -- SQL queries for major upgrade tests
    --
    -- This file includes a set of SQL queries to make a cluster to-be-upgraded
    -- compatible with the version this file is on.  This requires psql,
    -- as per-version queries are controlled with a set of \if clauses.
    
    +\if :oldpgversion_le84
    +DROP FUNCTION public.myfunc(integer);
    +\endif
    We could retire this part for <= 8.4.  The oldest version tested by
    the buildfarm is 9.2.
    
    +       psql -X -d regression -f "test-upgrade.sql" || psql_fix_sql_status=$?
    Shouldn't we use an absolute path here?  I was testing a VPATH build
    and that was not working properly.
    
    +-- commit 9e38c2bb5 and 97f73a978
    +-- DROP AGGREGATE array_larger_accum(anyarray);
    +DROP AGGREGATE array_cat_accum(anyarray);
    +
    +-- commit 76f412ab3
    +-- DROP OPERATOR @#@(bigint,NONE);
    +DROP OPERATOR @#@(NONE,bigint);
    +\endif
    The buildfarm does "CREATE OPERATOR @#@" and "CREATE AGGREGATE
    array_larger_accum" when dealing with an old version between 9.5 and
    13.  Shouldn't we do the same and create those objects rather than a
    plain DROP?  What you are doing is not wrong, and it should allow
    upgrades to work, but that's a bit inconsistent with the buildfarm in
    terms of coverage.
    
    +   ver >= 905 AND ver <= 1300 AS oldpgversion_95_13,
    +   ver >= 906 AND ver <= 1300 AS oldpgversion_96_13,
    +   ver >= 906 AND ver <= 1000 AS oldpgversion_96_10,
    So here, we have the choice between conditions that play with version
    ranges or we could make those checks simpler but compensate with a set
    of IF EXISTS queries.  I think that your choice is right.  The
    buildfarm mixes both styles to compensate with the cases where the
    objects are created after a drop.
    
    The list of objects and the version ranges look correct to me.
    --
    Michael
    
  66. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-12-01T07:19:44Z

    On Thu, Nov 18, 2021 at 03:58:18PM +0900, Michael Paquier wrote:
    > +   ver >= 905 AND ver <= 1300 AS oldpgversion_95_13,
    > +   ver >= 906 AND ver <= 1300 AS oldpgversion_96_13,
    > +   ver >= 906 AND ver <= 1000 AS oldpgversion_96_10,
    > So here, we have the choice between conditions that play with version
    > ranges or we could make those checks simpler but compensate with a set
    > of IF EXISTS queries.  I think that your choice is right.  The
    > buildfarm mixes both styles to compensate with the cases where the
    > objects are created after a drop.
    
    So, I have come back to this part of the patch set, that moves the SQL
    queries doing the pre-upgrade cleanups in the old version we upgrade
    from, and decided to go with what looks like the simplest approach,
    relying on some IFEs depending on the object types if they don't
    exist for some cases.
    
    While checking the whole thing, I have noticed that some of the
    operations were not really necessary.  The result is rather clean now,
    with a linear organization of the version logic, so as it is a
    no-brainer to get that done in back-branches per the
    backward-compatibility argument.
    
    I'll get that done down to 10 to maximize its influence, then I'll
    move on with the buildfarm code and send a patch to plug this and
    reduce the dependencies between core and the buildfarm code.
    --
    Michael
    
  67. Re: pg_upgrade test for binary compatibility of core data types

    Michael Paquier <michael@paquier.xyz> — 2021-12-02T01:49:22Z

    On Wed, Dec 01, 2021 at 04:19:44PM +0900, Michael Paquier wrote:
    > I'll get that done down to 10 to maximize its influence, then I'll
    > move on with the buildfarm code and send a patch to plug this and
    > reduce the dependencies between core and the buildfarm code.
    
    Okay, I have checked this one this morning, and applied the split down
    to 10, so as we have a way to fix objects from the main regression
    test suite.  The buildfarm client gets a bit cleaned up after that (I
    have a patch for that, but I am not 100% sure that it is right).
    
    Still, the global picture is larger than that because there is still
    nothing done for contrib/ modules included in cross-version checks of
    pg_upgrade by the buildfarm.  The core code tests don't do this much,
    but if we were to do the same things as the buildfarm, then we would
    need to run installcheck-world (roughly) on a deployed instance, then
    pg_upgrade it.  That's not going to be cheap, for sure.
    
    One thing that we could do is to use unique names for the databases of
    the contrib/ modules when running an installcheck, so as these are
    preserved for upgrades (the buildfarm client does that).  This has as
    effect to increase the number of databases for an instance
    installcheck'ed, so this had better be optional, at least.
    --
    Michael