Thread
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Move into separate file all the SQL queries used in pg_upgrade tests
- 1924d508c335 10.20 landed
- 0e603b75c434 11.15 landed
- b6e525648d72 12.10 landed
- fae5f08e1719 13.6 landed
- b6dac98b0561 14.2 landed
- 0df9641d3905 15.0 landed
-
Add table to regression tests for binary-compatibility checks in pg_upgrade
- a9993416f80f 12.10 landed
- 755f04c72ef1 13.6 landed
- cf3d79aa31f2 14.2 landed
- 835bcba8b8d7 15.0 landed
-
Fix tests of pg_upgrade across different major versions
- afa09e4a9af6 12.9 landed
- 2a8dee6a67cc 13.5 landed
- f4e1c8892b9e 14.1 landed
- fa66b6dee084 15.0 landed
-
Multirange datatypes
- 6df7a9698bb0 14.0 cited
-
Work around cross-version-upgrade issues created by commit 9e38c2bb5.
- 97f73a978fc1 14.0 cited
-
Declare assorted array functions using anycompatible not anyelement.
- 9e38c2bb5093 14.0 cited
-
Remove factorial operators, leaving only the factorial() function.
- 76f412ab3105 14.0 cited
-
Create by default sql/ and expected/ for output directory in pg_regress
- e78900afd217 14.0 cited
-
Add missing include to pg_upgrade/version.c
- bc3a94dc0005 9.4.25 landed
- 984aa0ede1d2 9.5.20 landed
- e09ab32a2205 9.6.16 landed
-
Improve the check for pg_catalog.line data type in pg_upgrade
- 235a52ca0f26 9.4.25 landed
- f57b01dd75ee 9.5.20 landed
- 0a643de08715 9.6.16 landed
- 2218fdca496b 10.11 landed
- a970b6cdebd1 11.6 landed
- ebb4caa9120d 12.1 landed
- 8d48e6a7240c 13.0 landed
-
Improve the check for pg_catalog.unknown data type in pg_upgrade
- e86ece22114d 10.11 landed
- d071a2539ff4 11.6 landed
- a8e49ae0c381 12.1 landed
- a524f50d0fc6 13.0 landed
-
Check for tables with sql_identifier during pg_upgrade
- eaf900e842ab 12.1 landed
- 0ccfc2822366 13.0 landed
-
pg_upgrade: clarify the database names in error files
- 1634d361577a 13.0 cited
-
In the pg_upgrade test suite, don't write to src/test/regress.
- 40b132c1afbb 12.0 cited
-
Allow group access on PGDATA
- c37b3d08ca68 11.0 cited
-
Refactor dir/file permissions
- da9b580d8990 11.0 cited
-
Remove unused functions in regress.c.
- db3af9feb19f 11.0 cited
-
Make WAL segment size configurable at initdb time.
- fc49e24fa69a 11.0 cited
-
Fix bit-rot in pg_upgrade's test.sh, and improve documentation.
- 5bab1985dfc2 10.0 cited
-
BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
PG Bug reporting form <noreply@postgresql.org> — 2019-10-08T17:08:53Z
The following bug has been logged on the website: Bug reference: 16045 Logged by: Hans Buschmann Email address: buschmann@nidsa.net PostgreSQL version: 12.0 Operating system: Windows 10 64bit Description: I just did a pg_upgrade from pg 11.5 to pg 12.0 on my development machine under Windows 64bit (both distributions from EDB). cpsdb=# select version (); version ------------------------------------------------------------ PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit (1 row) The pg_upgrade with --link went flawlessly, I started (only!) the new server 12.0 and could connect and access individual databases. As recommended by the resulting analyze_new_cluster.bat I tried a full vacuumdb with: "N:/pgsql/bin/vacuumdb" -U postgres --all --analyze-only which crashed with vacuumdb: vacuuming database "cpsdb" vacuumdb: error: vacuuming of table "admin.q_tbl_archiv" in database "cpsdb" failed: ERROR: compressed data is corrupted I connected to the database through pgsql and looked at the table "admin.q_tbl_archiv" cpsdb=# \d+ q_tbl_archiv; Table "admin.q_tbl_archiv" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------------+------------------------------------+-----------+----------+---------+----------+--------------+------------- table_name | information_schema.sql_identifier | | | | plain | | column_name | information_schema.sql_identifier | | | | plain | | ordinal_position | information_schema.cardinal_number | | | | plain | | col_qualifier | text | | | | extended | | id_column | information_schema.sql_identifier | | | | plain | | id_default | information_schema.character_data | | | | extended | | Access method: heap When trying to select * from q_tbl_archiv I got: cpsdb=# select * from q_tbl_archiv; ERROR: invalid memory alloc request size 18446744073709551613 This table was created a long time back under 9.5 or 9.6 with the (here truncated) following command: create table q_tbl_archiv as with qseason as ( select table_name,column_name, ordinal_position ,replace(column_name,'_season','') as col_qualifier -- ,'id_'||replace(column_name,'_season','') as id_column from information_schema.columns where column_name like '%_season' and ordinal_position < 10 and table_name in ( 'table1' ,'table2' -- here truncated: -- ... (here where all table of mine having columns like xxx_season) -- to reproduce, change to your own tablenames in a test database ) order by table_name ) select qs.*,c.column_name as id_column, c.column_default as id_default from qseason qs left join information_schema.columns c on c.table_name=qs.table_name and c.column_name like 'id_%' ; Until now this table was always restored without error by migrating to a new major version through pg_dump/initdb/pr_restore. To verify the integrity of the table I restored the dump taken under pg_dump from pg 11.5 just before the pg_upgrade to another machine. The restore and analyze went OK and select * from q_tbl_archiv showed all tuples, eg (edited): cpsdb_dev=# select * from q_tbl_archiv; table_name | column_name | ordinal_position | col_qualifier | id_column | id_default --------------------------+--------------+------------------+---------------+-----------+---------------------------------------------------------- table1 | chm_season | 2 | chm | | table2 | cs_season | 2 | cs | id_cs | nextval('table2_id_cs_seq'::regclass) ... In conclusion, this seems to me like an error/omission of pg_upgrade. It seems to handle these specially derived tables from information_schema not correctly, resulting in failures of the upgraded database. For me, this error is not so crucial, because this table is only used for administrative purposes and can easily be restored from backup. But I want to share my findings for the sake of other users of pg_upgrade. Thanks for investigating! Hans Buschmann -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T13:24:36Z
On Tue, Oct 08, 2019 at 05:08:53PM +0000, PG Bug reporting form wrote: >The following bug has been logged on the website: > >Bug reference: 16045 >Logged by: Hans Buschmann >Email address: buschmann@nidsa.net >PostgreSQL version: 12.0 >Operating system: Windows 10 64bit >Description: > >I just did a pg_upgrade from pg 11.5 to pg 12.0 on my development machine >under Windows 64bit (both distributions from EDB). > >cpsdb=# select version (); > version >------------------------------------------------------------ > PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit >(1 row) > >The pg_upgrade with --link went flawlessly, I started (only!) the new server >12.0 and could connect and access individual databases. > >As recommended by the resulting analyze_new_cluster.bat I tried a full >vacuumdb with: > >"N:/pgsql/bin/vacuumdb" -U postgres --all --analyze-only > >which crashed with >vacuumdb: vacuuming database "cpsdb" >vacuumdb: error: vacuuming of table "admin.q_tbl_archiv" in database "cpsdb" >failed: ERROR: compressed data is corrupted > >I connected to the database through pgsql and looked at the table >"admin.q_tbl_archiv" > >cpsdb=# \d+ q_tbl_archiv; > Table >"admin.q_tbl_archiv" > Column | Type | Collation | >Nullable | Default | Storage | Stats target | Description >------------------+------------------------------------+-----------+----------+---------+----------+--------------+------------- > table_name | information_schema.sql_identifier | | > | | plain | | > column_name | information_schema.sql_identifier | | > | | plain | | > ordinal_position | information_schema.cardinal_number | | > | | plain | | > col_qualifier | text | | > | | extended | | > id_column | information_schema.sql_identifier | | > | | plain | | > id_default | information_schema.character_data | | > | | extended | | >Access method: heap > >When trying to select * from q_tbl_archiv I got: > >cpsdb=# select * from q_tbl_archiv; >ERROR: invalid memory alloc request size 18446744073709551613 > >This table was created a long time back under 9.5 or 9.6 with the (here >truncated) following command: > > >create table q_tbl_archiv as >with >qseason as ( >select table_name,column_name, ordinal_position >,replace(column_name,'_season','') as col_qualifier >-- ,'id_'||replace(column_name,'_season','') as id_column >from information_schema.columns >where >column_name like '%_season' >and ordinal_position < 10 >and table_name in ( > 'table1' >,'table2' >-- here truncated: >-- ... (here where all table of mine having columns like xxx_season) >-- to reproduce, change to your own tablenames in a test database >) >order by table_name >) >select qs.*,c.column_name as id_column, c.column_default as id_default >from > qseason qs > left join information_schema.columns c on c.table_name=qs.table_name and >c.column_name like 'id_%' >; > >Until now this table was always restored without error by migrating to a new >major version through pg_dump/initdb/pr_restore. > >To verify the integrity of the table I restored the dump taken under pg_dump >from pg 11.5 just before the pg_upgrade to another machine. > >The restore and analyze went OK and select * from q_tbl_archiv showed all >tuples, eg (edited): > >cpsdb_dev=# select * from q_tbl_archiv; > table_name | column_name | ordinal_position | col_qualifier >| id_column | id_default >--------------------------+--------------+------------------+---------------+-----------+---------------------------------------------------------- > table1 | chm_season | 2 | chm >| | > table2 | cs_season | 2 | cs >| id_cs | nextval('table2_id_cs_seq'::regclass) >... > >In conclusion, this seems to me like an error/omission of pg_upgrade. > There's clearly something bad happening. It's a bit strange, though. Had this been a data corruption issue, I'd expect the pg_dump to fail too, but it succeeds. >It seems to handle these specially derived tables from information_schema >not correctly, resulting in failures of the upgraded database. > Well, I don't see how that should make any difference. It's a CTAS and that should create a regular table, that's not an issue. I wonder if there were some changes to the data types involved, but that would be essentially a break in on-disk format and we're careful about not doing that ... >For me, this error is not so crucial, because this table is only used for >administrative purposes and can easily be restored from backup. > >But I want to share my findings for the sake of other users of pg_upgrade. > OK, thanks. Could you maybe set log_error_verbosity = verbose before invoking the vacuum (you can set that in that session)? That should give us more details about where exactly the error is triggered. Even better, if you could attach a debugger to the session, set breakpoints on locations triggering 'invalid memory alloc request size' and then show the backtrace (obviously, that's more complicated). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T13:59:07Z
FWIW I can reproduce this - it's enough to do this on the 11 cluster create table q_tbl_archiv as with qseason as ( select table_name,column_name, ordinal_position ,replace(column_name,'_season','') as col_qualifier -- ,'id_'||replace(column_name,'_season','') as id_column from information_schema.columns order by table_name ) select qs.*,c.column_name as id_column, c.column_default as id_default from qseason qs left join information_schema.columns c on c.table_name=qs.table_name and c.column_name like 'id_%'; and then analyze q_tbl_archiv which produces backtrace like this: No symbol "stats" in current context. (gdb) bt #0 0x0000746095262951 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6 #1 0x0000000000890a8e in varstrfastcmp_locale (a1p=0x17716b4 "per_language\a", len1=<optimized out>, a2p=0x176af28 '\177' <repeats 136 times>, "\021\004", len2=-4, ssup=<optimized out>, ssup=<optimized out>) at varlena.c:2320 #2 0x0000000000890cb1 in varlenafastcmp_locale (x=24581808, y=24555300, ssup=0x7ffc649463f0) at varlena.c:2219 #3 0x00000000005b73b4 in ApplySortComparator (ssup=0x7ffc649463f0, isNull2=false, datum2=<optimized out>, isNull1=false, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224 #4 compare_scalars (a=<optimized out>, b=<optimized out>, arg=0x7ffc649463e0) at analyze.c:2700 #5 0x00000000008f9953 in qsort_arg (a=a@entry=0x178fdc0, n=<optimized out>, n@entry=2158, es=es@entry=16, cmp=cmp@entry=0x5b7390 <compare_scalars>, arg=arg@entry=0x7ffc649463e0) at qsort_arg.c:140 #6 0x00000000005b86a6 in compute_scalar_stats (stats=0x176a208, fetchfunc=<optimized out>, samplerows=<optimized out>, totalrows=2158) at analyze.c:2273 #7 0x00000000005b9d95 in do_analyze_rel (onerel=onerel@entry=0x74608c00d3e8, params=params@entry=0x7ffc64946970, va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>, relpages=22, inh=inh@entry=false, in_outer_xact=false, elevel=13) at analyze.c:529 #8 0x00000000005bb2c9 in analyze_rel (relid=<optimized out>, relation=<optimized out>, params=params@entry=0x7ffc64946970, va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:260 #9 0x000000000062c7b0 in vacuum (relations=0x1727120, params=params@entry=0x7ffc64946970, bstrategy=<optimized out>, bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=true) at vacuum.c:413 #10 0x000000000062cd49 in ExecVacuum (pstate=pstate@entry=0x16c9518, vacstmt=vacstmt@entry=0x16a82b8, isTopLevel=isTopLevel@entry=true) at vacuum.c:199 #11 0x00000000007a6d64 in standard_ProcessUtility (pstmt=0x16a8618, queryString=0x16a77a8 "", context=<optimized out>, params=0x0, queryEnv=0x0, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at utility.c:670 #12 0x00000000007a4006 in PortalRunUtility (portal=0x170f368, pstmt=0x16a8618, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at pquery.c:1175 #13 0x00000000007a4b61 in PortalRunMulti (portal=portal@entry=0x170f368, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710, completionTag=completionTag@entry=0x7ffc64946cb0 "") at pquery.c:1321 #14 0x00000000007a5864 in PortalRun (portal=portal@entry=0x170f368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710, completionTag=0x7ffc64946cb0 "") at pquery.c:796 #15 0x00000000007a174e in exec_simple_query (query_string=0x16a77a8 "") at postgres.c:1215 Looking at compute_scalar_stats, the "stats" parameter does not seem particularly healthy: (gdb) p *stats $3 = {attr = 0x10, attrtypid = 12, attrtypmod = 0, attrtype = 0x1762e00, attrcollid = 356, anl_context = 0x7f7f7f7e00000000, compute_stats = 0x100, minrows = 144, extra_data = 0x1762e00, stats_valid = false, stanullfrac = 0, stawidth = 0, stadistinct = 0, stakind = {0, 0, 0, 0, 0}, staop = {0, 0, 0, 0, 0}, stacoll = {0, 0, 0, 0, 0}, numnumbers = {0, 0, 0, 0, 0}, stanumbers = {0x0, 0x0, 0x0, 0x0, 0x0}, numvalues = {0, 0, 0, 0, 2139062142}, stavalues = { 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f}, statypid = {2139062143, 2139062143, 2139062143, 2139062143, 2139062143}, statyplen = {32639, 32639, 32639, 32639, 32639}, statypbyval = {127, 127, 127, 127, 127}, statypalign = "\177\177\177\177\177", tupattnum = 2139062143, rows = 0x7f7f7f7f7f7f7f7f, tupDesc = 0x7f7f7f7f7f7f7f7f, exprvals = 0x8, exprnulls = 0x4, rowstride = 24522240} Not sure about the root cause yet. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-09T14:07:01Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > FWIW I can reproduce this - it's enough to do this on the 11 cluster I failed to reproduce any problem from your example, but I was trying in C locale on a Linux machine. What environment are you testing? regards, tom lane
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T14:18:41Z
On Wed, Oct 09, 2019 at 03:59:07PM +0200, Tomas Vondra wrote: >FWIW I can reproduce this - it's enough to do this on the 11 cluster > >create table q_tbl_archiv as >with >qseason as ( >select table_name,column_name, ordinal_position >,replace(column_name,'_season','') as col_qualifier >-- ,'id_'||replace(column_name,'_season','') as id_column >from information_schema.columns >order by table_name >) >select qs.*,c.column_name as id_column, c.column_default as id_default >from > qseason qs > left join information_schema.columns c on c.table_name=qs.table_name and >c.column_name like 'id_%'; > > >and then > > analyze q_tbl_archiv > >which produces backtrace like this: > >No symbol "stats" in current context. >(gdb) bt >#0 0x0000746095262951 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6 >#1 0x0000000000890a8e in varstrfastcmp_locale (a1p=0x17716b4 "per_language\a", len1=<optimized out>, a2p=0x176af28 '\177' <repeats 136 times>, "\021\004", len2=-4, ssup=<optimized out>, ssup=<optimized out>) at varlena.c:2320 >#2 0x0000000000890cb1 in varlenafastcmp_locale (x=24581808, y=24555300, ssup=0x7ffc649463f0) at varlena.c:2219 >#3 0x00000000005b73b4 in ApplySortComparator (ssup=0x7ffc649463f0, isNull2=false, datum2=<optimized out>, isNull1=false, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224 >#4 compare_scalars (a=<optimized out>, b=<optimized out>, arg=0x7ffc649463e0) at analyze.c:2700 >#5 0x00000000008f9953 in qsort_arg (a=a@entry=0x178fdc0, n=<optimized out>, n@entry=2158, es=es@entry=16, cmp=cmp@entry=0x5b7390 <compare_scalars>, arg=arg@entry=0x7ffc649463e0) at qsort_arg.c:140 >#6 0x00000000005b86a6 in compute_scalar_stats (stats=0x176a208, fetchfunc=<optimized out>, samplerows=<optimized out>, totalrows=2158) at analyze.c:2273 >#7 0x00000000005b9d95 in do_analyze_rel (onerel=onerel@entry=0x74608c00d3e8, params=params@entry=0x7ffc64946970, va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>, relpages=22, inh=inh@entry=false, in_outer_xact=false, elevel=13) > at analyze.c:529 >#8 0x00000000005bb2c9 in analyze_rel (relid=<optimized out>, relation=<optimized out>, params=params@entry=0x7ffc64946970, va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:260 >#9 0x000000000062c7b0 in vacuum (relations=0x1727120, params=params@entry=0x7ffc64946970, bstrategy=<optimized out>, bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=true) at vacuum.c:413 >#10 0x000000000062cd49 in ExecVacuum (pstate=pstate@entry=0x16c9518, vacstmt=vacstmt@entry=0x16a82b8, isTopLevel=isTopLevel@entry=true) at vacuum.c:199 >#11 0x00000000007a6d64 in standard_ProcessUtility (pstmt=0x16a8618, queryString=0x16a77a8 "", context=<optimized out>, params=0x0, queryEnv=0x0, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at utility.c:670 >#12 0x00000000007a4006 in PortalRunUtility (portal=0x170f368, pstmt=0x16a8618, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at pquery.c:1175 >#13 0x00000000007a4b61 in PortalRunMulti (portal=portal@entry=0x170f368, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710, > completionTag=completionTag@entry=0x7ffc64946cb0 "") at pquery.c:1321 >#14 0x00000000007a5864 in PortalRun (portal=portal@entry=0x170f368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710, > completionTag=0x7ffc64946cb0 "") at pquery.c:796 >#15 0x00000000007a174e in exec_simple_query (query_string=0x16a77a8 "") at postgres.c:1215 > >Looking at compute_scalar_stats, the "stats" parameter does not seem >particularly healthy: > >(gdb) p *stats >$3 = {attr = 0x10, attrtypid = 12, attrtypmod = 0, attrtype = 0x1762e00, attrcollid = 356, anl_context = 0x7f7f7f7e00000000, compute_stats = 0x100, minrows = 144, extra_data = 0x1762e00, stats_valid = false, stanullfrac = 0, > stawidth = 0, stadistinct = 0, stakind = {0, 0, 0, 0, 0}, staop = {0, 0, 0, 0, 0}, stacoll = {0, 0, 0, 0, 0}, numnumbers = {0, 0, 0, 0, 0}, stanumbers = {0x0, 0x0, 0x0, 0x0, 0x0}, numvalues = {0, 0, 0, 0, 2139062142}, stavalues = { > 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f}, statypid = {2139062143, 2139062143, 2139062143, 2139062143, 2139062143}, statyplen = {32639, 32639, 32639, 32639, 32639}, > statypbyval = {127, 127, 127, 127, 127}, statypalign = "\177\177\177\177\177", tupattnum = 2139062143, rows = 0x7f7f7f7f7f7f7f7f, tupDesc = 0x7f7f7f7f7f7f7f7f, exprvals = 0x8, exprnulls = 0x4, rowstride = 24522240} > >Not sure about the root cause yet. > OK, a couple more observations - the table schema looks like this: Table "public.q_tbl_archiv" Column | Type | Collation | Nullable | Default ------------------+------------------------------------+-----------+----------+--------- table_name | information_schema.sql_identifier | | | column_name | information_schema.sql_identifier | | | ordinal_position | information_schema.cardinal_number | | | col_qualifier | text | | | id_column | information_schema.sql_identifier | | | id_default | information_schema.character_data | | | and I can succesfully do this: test=# analyze q_tbl_archiv (table_name, column_name, ordinal_position, id_column, id_default); ANALYZE but as soon as I include the col_qualifier column, it fails: test=# analyze q_tbl_archiv (table_name, column_name, ordinal_position, id_column, id_default, col_qualifier); ERROR: compressed data is corrupted But it fails differently (with the segfault) when analyzing just the one column: test=# analyze q_tbl_archiv (col_qualifier); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. Moreover, there are some other interesting failures - I can do select max(table_name) from q_tbl_archiv; select max(column_name) from q_tbl_archiv; select max(ordinal_position) from q_tbl_archiv; but as soon as I try doing that with col_qualifier, it crashes and burns: select max(col_qualifier) from q_tbl_archiv; The backtrace is rather strange in this case (a lot of missing calls, etc.). However, when called for the next two columns, it still crashes, but the backtraces look somewhat saner: select max(id_column) from q_tbl_archiv; Program received signal SIGSEGV, Segmentation fault. 0x00007db3186c6617 in __strlen_avx2 () from /lib64/libc.so.6 (gdb) bt #0 0x00007db3186c6617 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x0000000000894ced in cstring_to_text (s=0x7db32ce38935 <error: Cannot access memory at address 0x7db32ce38935>) at varlena.c:173 #2 name_text (fcinfo=<optimized out>) at varlena.c:3573 #3 0x000000000063860d in ExecInterpExpr (state=0x1136900, econtext=0x1135128, isnull=<optimized out>) at execExprInterp.c:649 #4 0x000000000064f699 in ExecEvalExprSwitchContext (isNull=0x7ffcfd8f3b2f, econtext=<optimized out>, state=<optimized out>) at ../../../src/include/executor/executor.h:307 #5 advance_aggregates (aggstate=0x1134ef0, aggstate=0x1134ef0) at nodeAgg.c:679 #6 agg_retrieve_direct (aggstate=0x1134ef0) at nodeAgg.c:1847 #7 ExecAgg (pstate=0x1134ef0) at nodeAgg.c:1572 #8 0x000000000063b58b in ExecProcNode (node=0x1134ef0) at ../../../src/include/executor/executor.h:239 #9 ExecutePlan (execute_once=<optimized out>, dest=0x1144248, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1134ef0, estate=0x1134c98) at execMain.c:1646 #10 standard_ExecutorRun (queryDesc=0x1094f18, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 #11 0x00000000007a43cc in PortalRunSelect (portal=0x10da368, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:929 #12 0x00000000007a5958 in PortalRun (portal=portal@entry=0x10da368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x1144248, altdest=altdest@entry=0x1144248, completionTag=0x7ffcfd8f3db0 "") at pquery.c:770 #13 0x00000000007a177e in exec_simple_query (query_string=0x10727a8 "select max(id_column) from q_tbl_archiv ;") at postgres.c:1215 #14 0x00000000007a2f3f in PostgresMain (argc=<optimized out>, argv=argv@entry=0x109e400, dbname=<optimized out>, username=<optimized out>) at postgres.c:4236 #15 0x00000000007237ce in BackendRun (port=0x1097c30, port=0x1097c30) at postmaster.c:4437 #16 BackendStartup (port=0x1097c30) at postmaster.c:4128 #17 ServerLoop () at postmaster.c:1704 #18 0x000000000072458e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x106c350) at postmaster.c:1377 #19 0x000000000047d101 in main (argc=3, argv=0x106c350) at main.c:228 select max(id_default) from q_tbl_archiv; Program received signal SIGABRT, Aborted. 0x00007db3185a1e35 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007db3185a1e35 in raise () from /lib64/libc.so.6 #1 0x00007db31858c895 in abort () from /lib64/libc.so.6 #2 0x00000000008b4470 in ExceptionalCondition (conditionName=conditionName@entry=0xabe49e "1", errorType=errorType@entry=0x907128 "unrecognized TOAST vartag", fileName=fileName@entry=0xa4965b "execTuples.c", lineNumber=lineNumber@entry=971) at assert.c:54 #3 0x00000000006466d3 in slot_deform_heap_tuple (natts=6, offp=0x1135170, tuple=<optimized out>, slot=0x1135128) at execTuples.c:985 #4 tts_buffer_heap_getsomeattrs (slot=0x1135128, natts=<optimized out>) at execTuples.c:676 #5 0x00000000006489fc in slot_getsomeattrs_int (slot=slot@entry=0x1135128, attnum=6) at execTuples.c:1877 #6 0x00000000006379a3 in slot_getsomeattrs (attnum=<optimized out>, slot=0x1135128) at ../../../src/include/executor/tuptable.h:345 #7 ExecInterpExpr (state=0x11364b0, econtext=0x1134cd8, isnull=<optimized out>) at execExprInterp.c:441 #8 0x000000000064f699 in ExecEvalExprSwitchContext (isNull=0x7ffcfd8f3b2f, econtext=<optimized out>, state=<optimized out>) at ../../../src/include/executor/executor.h:307 #9 advance_aggregates (aggstate=0x1134aa0, aggstate=0x1134aa0) at nodeAgg.c:679 #10 agg_retrieve_direct (aggstate=0x1134aa0) at nodeAgg.c:1847 #11 ExecAgg (pstate=0x1134aa0) at nodeAgg.c:1572 #12 0x000000000063b58b in ExecProcNode (node=0x1134aa0) at ../../../src/include/executor/executor.h:239 #13 ExecutePlan (execute_once=<optimized out>, dest=0x11439d8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1134aa0, estate=0x1134848) at execMain.c:1646 #14 standard_ExecutorRun (queryDesc=0x1094f18, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 #15 0x00000000007a43cc in PortalRunSelect (portal=0x10da368, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:929 #16 0x00000000007a5958 in PortalRun (portal=portal@entry=0x10da368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x11439d8, altdest=altdest@entry=0x11439d8, completionTag=0x7ffcfd8f3db0 "") at pquery.c:770 #17 0x00000000007a177e in exec_simple_query (query_string=0x10727a8 "select max(id_default) from q_tbl_archiv ;") at postgres.c:1215 #18 0x00000000007a2f3f in PostgresMain (argc=<optimized out>, argv=argv@entry=0x109e4f0, dbname=<optimized out>, username=<optimized out>) at postgres.c:4236 #19 0x00000000007237ce in BackendRun (port=0x10976f0, port=0x10976f0) at postmaster.c:4437 #20 BackendStartup (port=0x10976f0) at postmaster.c:4128 #21 ServerLoop () at postmaster.c:1704 #22 0x000000000072458e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x106c350) at postmaster.c:1377 #23 0x000000000047d101 in main (argc=3, argv=0x106c350) at main.c:228 It's quite puzzling, though. If I had to guess, I'd say it's some sort of memory management issue (either we're corrupting it somehow, or perhaps using it after pfree). -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T14:28:27Z
On Wed, Oct 09, 2019 at 10:07:01AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> FWIW I can reproduce this - it's enough to do this on the 11 cluster > >I failed to reproduce any problem from your example, but I was trying >in C locale on a Linux machine. What environment are you testing? > > regards, tom lane test=# show lc_collate ; lc_collate ------------ C.UTF-8 (1 row) I can reproduce this pretty easily like this: 1) build 11 git checkout REL_11_STABLE ./configure --prefix=/home/user/pg-11 --enable-debug --enable-cassert && make -s clean && make -s -j4 install 2) build 12 git checkout REL_12_STABLE ./configure --prefix=/home/user/pg-12 --enable-debug --enable-cassert && make -s clean && make -s -j4 install 3) create the 11 cluster /home/user/pg-11/bin/pg_ctl -D /tmp/data-11 init /home/user/pg-11/bin/pg_ctl -D /tmp/data-11 -l /tmp/pg-11.log start /home/user/pg-11/bin/createdb test /home/user/pg-11/bin/psql test 4) create the table create table q_tbl_archiv as with qseason as ( select table_name,column_name, ordinal_position ,replace(column_name,'_season','') as col_qualifier -- ,'id_'||replace(column_name,'_season','') as id_column from information_schema.columns order by table_name ) select qs.*,c.column_name as id_column, c.column_default as id_default from qseason qs left join information_schema.columns c on c.table_name=qs.table_name and c.column_name like 'id_%'; 5) shutdown the 11 cluster /home/user/pg-11/bin/pg_ctl -D /tmp/data-11 stop 6) init 12 cluster /home/user/pg-12/bin/pg_ctl -D /tmp/data-12 init 7) do the pg_upgrade thing /home/user/pg-12/bin/pg_upgrade -b /home/user/pg-11/bin -B /home/user/pg-12/bin -d /tmp/data-11 -D /tmp/data-12 -k 8) start 12 cluster /home/user/pg-12/bin/pg_ctl -D /tmp/data-12 -l /tmp/pg-12.log start 9) kabooom /home/user/pg-12/bin/psql test -c "analyze q_tbl_archiv" On my system (Fedora 30 in x86_64) this reliably results a crash (and various other crashes as demonstrated in my previous message). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -
AW: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Hans Buschmann <buschmann@nidsa.net> — 2019-10-09T14:50:52Z
Hi Tomas, Nice that you could reproduce it. This was just the way I followed. For your Info, here are my no-standard config params: name | current_setting ------------------------------------+--------------------------------- application_name | psql auto_explain.log_analyze | on auto_explain.log_min_duration | 0 auto_explain.log_nested_statements | on client_encoding | WIN1252 cluster_name | HB_DEV data_checksums | on DateStyle | ISO, DMY default_text_search_config | pg_catalog.german dynamic_shared_memory_type | windows effective_cache_size | 8GB lc_collate | C lc_ctype | German_Germany.1252 lc_messages | C lc_monetary | German_Germany.1252 lc_numeric | German_Germany.1252 lc_time | German_Germany.1252 log_destination | stderr log_directory | N:/ZZ_log/pg_log_hbdev log_error_verbosity | verbose log_file_mode | 0640 log_line_prefix | WHB %a %t %i %e %2l:> log_statement | mod log_temp_files | 0 log_timezone | CET logging_collector | on maintenance_work_mem | 128MB max_connections | 100 max_stack_depth | 2MB max_wal_size | 1GB min_wal_size | 80MB pg_stat_statements.max | 5000 pg_stat_statements.track | all random_page_cost | 1 search_path | public, archiv, ablage, admin server_encoding | UTF8 server_version | 12.0 shared_buffers | 1GB shared_preload_libraries | auto_explain,pg_stat_statements temp_buffers | 32MB TimeZone | CET transaction_deferrable | off transaction_isolation | read committed transaction_read_only | off update_process_title | off wal_buffers | 16MB wal_segment_size | 16MB work_mem | 32MB (48 rows) Indeed, the database has UTF8 Encoding. The Extended error-log (i have set auto_explain): WHB psql 2019-10-09 15:45:03 CEST XX000 7:> ERROR: XX000: invalid memory alloc request size 18446744073709551613 WHB psql 2019-10-09 15:45:03 CEST XX000 8:> LOCATION: palloc, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\utils\mmgr\mcxt.c:934 WHB psql 2019-10-09 15:45:03 CEST XX000 9:> STATEMENT: select * from q_tbl_archiv; WHB vacuumdb 2019-10-09 15:46:42 CEST 00000 1:> LOG: 00000: duration: 0.022 ms plan: Query Text: SELECT pg_catalog.set_config('search_path', '', false); Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1) WHB vacuumdb 2019-10-09 15:46:42 CEST 00000 2:> LOCATION: explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415 WHB vacuumdb 2019-10-09 15:46:42 CEST 00000 3:> LOG: 00000: duration: 0.072 ms plan: Query Text: SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1; Sort (cost=1.16..1.16 rows=1 width=64) (actual time=0.063..0.064 rows=14 loops=1) Sort Key: datname Sort Method: quicksort Memory: 26kB -> Seq Scan on pg_database (cost=0.00..1.15 rows=1 width=64) (actual time=0.018..0.022 rows=14 loops=1) Filter: datallowconn Rows Removed by Filter: 1 WHB vacuumdb 2019-10-09 15:46:42 CEST 00000 4:> LOCATION: explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415 WHB vacuumdb 2019-10-09 15:46:43 CEST 00000 1:> LOG: 00000: duration: 0.027 ms plan: Query Text: SELECT pg_catalog.set_config('search_path', '', false); Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.012..0.013 rows=1 loops=1) WHB vacuumdb 2019-10-09 15:46:43 CEST 00000 2:> LOCATION: explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415 WHB vacuumdb 2019-10-09 15:46:43 CEST 00000 3:> LOG: 00000: duration: 1.036 ms plan: Query Text: SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm']) ORDER BY c.relpages DESC; Sort (cost=56.56..56.59 rows=13 width=132) (actual time=0.843..0.854 rows=320 loops=1) Sort Key: c.relpages DESC Sort Method: quicksort Memory: 110kB -> Hash Join (cost=1.23..56.32 rows=13 width=132) (actual time=0.082..0.649 rows=320 loops=1) Hash Cond: (c.relnamespace = ns.oid) -> Seq Scan on pg_class c (cost=0.00..55.05 rows=13 width=76) (actual time=0.034..0.545 rows=320 loops=1) Filter: ((relkind)::text = ANY ('{r,m}'::text[])) Rows Removed by Filter: 950 -> Hash (cost=1.10..1.10 rows=10 width=68) (actual time=0.022..0.022 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on pg_namespace ns (cost=0.00..1.10 rows=10 width=68) (actual time=0.010..0.011 rows=10 loops=1) WHB vacuumdb 2019-10-09 15:46:43 CEST 00000 4:> LOCATION: explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415 WHB vacuumdb 2019-10-09 15:46:43 CEST 00000 5:> LOG: 00000: duration: 0.011 ms plan: Query Text: SELECT pg_catalog.set_config('search_path', '', false); Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.008..0.008 rows=1 loops=1) WHB vacuumdb 2019-10-09 15:46:43 CEST 00000 6:> LOCATION: explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415 WHB 2019-10-09 15:47:01 CEST 00000 22:> LOG: 00000: server process (PID 4708) was terminated by exception 0xC0000005 WHB 2019-10-09 15:47:01 CEST 00000 23:> DETAIL: Failed process was running: ANALYZE admin.q_tbl_archiv; WHB 2019-10-09 15:47:01 CEST 00000 24:> HINT: See C include file "ntstatus.h" for a description of the hexadecimal value. WHB 2019-10-09 15:47:01 CEST 00000 25:> LOCATION: LogChildExit, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3670 WHB 2019-10-09 15:47:01 CEST 00000 26:> LOG: 00000: terminating any other active server processes WHB 2019-10-09 15:47:01 CEST 00000 27:> LOCATION: HandleChildCrash, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3400 WHB psql 2019-10-09 15:47:01 CEST 57P02 10:> WARNING: 57P02: terminating connection because of crash of another server process WHB psql 2019-10-09 15:47:01 CEST 57P02 11:> DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. WHB psql 2019-10-09 15:47:01 CEST 57P02 12:> HINT: In a moment you should be able to reconnect to the database and repeat your command. WHB psql 2019-10-09 15:47:01 CEST 57P02 13:> LOCATION: quickdie, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\tcop\postgres.c:2717 WHB 2019-10-09 15:47:02 CEST 57P02 3:> WARNING: 57P02: terminating connection because of crash of another server process WHB 2019-10-09 15:47:02 CEST 57P02 4:> DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. WHB 2019-10-09 15:47:02 CEST 57P02 5:> HINT: In a moment you should be able to reconnect to the database and repeat your command. WHB 2019-10-09 15:47:02 CEST 57P02 6:> LOCATION: quickdie, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\tcop\postgres.c:2717 WHB 2019-10-09 15:47:02 CEST 00000 28:> LOG: 00000: all server processes terminated; reinitializing WHB 2019-10-09 15:47:02 CEST 00000 29:> LOCATION: PostmasterStateMachine, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3912 WHB 2019-10-09 15:47:02 CEST 00000 1:> LOG: 00000: database system was interrupted; last known up at 2019-10-09 15:46:03 CEST WHB 2019-10-09 15:47:02 CEST 00000 2:> LOCATION: StartupXLOG, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\access\transam\xlog.c:6277 The table was imported successively by pg_dump/pg_restore from the previous versions into pg11. This was the same what I did on the other machine (pg 11.5). On this test machine I could successfully Export the table with pg_dump -t. On the erroneous PG12 Cluster I succeeded to recreate a similar table with the original create table Statements: no Errors. Under PG12 upgraded, I tried to select only the first column (select table_name from q_tbl_archiv) and got erroneaus results (shown first 2 entries): cpsdb=# select table_name from q_tbl_archiv; table_name --------------------------------------------- \x11chemmat\x17chm_season !collectionsheet\x15cs_season It seems that the length Bytes are present in the Output. Hope this Information helps. Hans Buschmann -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T23:07:59Z
Well, I think I found the root cause. It's because of 7c15cef86d, which changed the definition of sql_identifier so that it's a domain over name instead of varchar. So we now have this: SELECT typname, typlen FROM pg_type WHERE typname = 'sql_identifier': -[ RECORD 1 ]--+--------------- typname | sql_identifier typlen | -1 instead of this -[ RECORD 1 ]--+--------------- typname | sql_identifier typlen | 64 Unfortunately, that seems very much like a break of on-disk format, and after pg_upgrade any table containing sql_identifier columns is pretty much guaranteed to be badly mangled. For example, the first row from the table used in the original report looks like this on PostgreSQL 11: test=# select ctid, * from q_tbl_archiv limit 1; -[ RECORD 1 ]----+-------------------------- ctid | (0,1) table_name | _pg_foreign_data_wrappers column_name | foreign_data_wrapper_name ordinal_position | 5 col_qualifier | foreign_data_wrapper_name id_column | id_default | while on PostgreSQL 12 after pg_upgrade it looks like this test=# select ctid, table_name, column_name, ordinal_position from q_tbl_archiv limit 1;: -[ RECORD 1 ]----+--------------------------------------------------------- ctid | (0,1) table_name | 5_pg_foreign_data_wrappers5foreign_data_wrapper_name\x05 column_name | _data_wrapper_name ordinal_position | 0 Not sure what to do about this :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-09T23:18:45Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Well, I think I found the root cause. It's because of 7c15cef86d, which > changed the definition of sql_identifier so that it's a domain over name > instead of varchar. Ah... > Not sure what to do about this :-( Fortunately, there should be close to zero people with user tables depending on sql_identifier. I think we should just add a test in pg_upgrade that refuses to upgrade if there are any such columns. It won't be the first such restriction. regards, tom lane
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-09T23:28:36Z
On Wed, Oct 09, 2019 at 07:18:45PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> Well, I think I found the root cause. It's because of 7c15cef86d, which >> changed the definition of sql_identifier so that it's a domain over name >> instead of varchar. > >Ah... > >> Not sure what to do about this :-( > >Fortunately, there should be close to zero people with user tables >depending on sql_identifier. I think we should just add a test in >pg_upgrade that refuses to upgrade if there are any such columns. >It won't be the first such restriction. > Hmmm, yeah. I agree the number of people using sql_identifier in user tables is low, but OTOH we got this report within a week after release, so maybe it's higher than we think. Another option would be to teach pg_upgrade to switch the columns to 'text' or 'varchar', not sure if that's possible or how much work would that be. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-09T23:41:54Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Wed, Oct 09, 2019 at 07:18:45PM -0400, Tom Lane wrote: >> Fortunately, there should be close to zero people with user tables >> depending on sql_identifier. I think we should just add a test in >> pg_upgrade that refuses to upgrade if there are any such columns. >> It won't be the first such restriction. > Hmmm, yeah. I agree the number of people using sql_identifier in user > tables is low, but OTOH we got this report within a week after release, > so maybe it's higher than we think. True. > Another option would be to teach pg_upgrade to switch the columns to > 'text' or 'varchar', not sure if that's possible or how much work would > that be. I think it'd be a mess --- the actual hacking would have to happen in pg_dump, I think, and it'd be a kluge because pg_dump doesn't normally understand what server version its output is going to. So we'd more or less have to control it through a new pg_dump switch that pg_upgrade would use. Ick. Also, even if we did try to silently convert such columns that way, I bet we'd get other bug reports about "why'd my columns suddenly change type?". So I'd rather force the user to be involved. regards, tom lane
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Andres Freund <andres@anarazel.de> — 2019-10-10T01:48:13Z
On 2019-10-09 19:41:54 -0400, Tom Lane wrote: > Also, even if we did try to silently convert such columns that way, > I bet we'd get other bug reports about "why'd my columns suddenly > change type?". So I'd rather force the user to be involved. +1
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-10T13:43:03Z
On Wed, Oct 09, 2019 at 06:48:13PM -0700, Andres Freund wrote: >On 2019-10-09 19:41:54 -0400, Tom Lane wrote: >> Also, even if we did try to silently convert such columns that way, >> I bet we'd get other bug reports about "why'd my columns suddenly >> change type?". So I'd rather force the user to be involved. > >+1 Fair enough, attached is a patch doing that, I think. Maybe the file should be named differently, as it contains other objects than just tables. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-10T14:19:12Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Fair enough, attached is a patch doing that, I think. Maybe the file > should be named differently, as it contains other objects than just > tables. Seems about right, though I notice it will not detect domains over sql_identifier. How much do we care about that? To identify such domains, I think we'd need something like WHERE attypid IN (recursive-WITH-query), which makes me nervous. We did support those starting with 8.4, which is as far back as pg_upgrade will go, so in theory it should work. But I think we had bugs with such cases in old releases. Do we want to assume that the source server has been updated enough to avoid any such bugs? The expense of such a query might be daunting, too. regards, tom lane
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-10T16:33:50Z
On Thu, Oct 10, 2019 at 10:19:12AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> Fair enough, attached is a patch doing that, I think. Maybe the file >> should be named differently, as it contains other objects than just >> tables. > >Seems about right, though I notice it will not detect domains over >sql_identifier. How much do we care about that? > >To identify such domains, I think we'd need something like >WHERE attypid IN (recursive-WITH-query), which makes me nervous. >We did support those starting with 8.4, which is as far back as >pg_upgrade will go, so in theory it should work. But I think we >had bugs with such cases in old releases. Do we want to assume >that the source server has been updated enough to avoid any such >bugs? The expense of such a query might be daunting, too. > Not sure. Regarding bugs, I think it's fine to assume the users are running sufficiently recent version - they may not, but then they're probably subject to various other bugs (data corruption, queries). If they're not, then they'll either get false positives (in which case they'll be forced to update) or false negatives (which is just as if we did nothing). For the query cost, I think we can assume the domain hierarchies are not particularly deep (in practice I'd expect just domains directly on the sql_identifier type). And I doubt people are using that very widely, it's probably more like this report - ad-hoc CTAS, so just a couple of items. So I wouldn't expect it to be a huge deal in most cases. But even if it takes a second or two, it's a one-time cost. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-10T20:14:20Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Thu, Oct 10, 2019 at 10:19:12AM -0400, Tom Lane wrote: >> To identify such domains, I think we'd need something like >> WHERE attypid IN (recursive-WITH-query), which makes me nervous. >> We did support those starting with 8.4, which is as far back as >> pg_upgrade will go, so in theory it should work. But I think we >> had bugs with such cases in old releases. Do we want to assume >> that the source server has been updated enough to avoid any such >> bugs? The expense of such a query might be daunting, too. > For the query cost, I think we can assume the domain hierarchies are not > particularly deep (in practice I'd expect just domains directly on the > sql_identifier type). And I doubt people are using that very widely, > it's probably more like this report - ad-hoc CTAS, so just a couple of > items. So I wouldn't expect it to be a huge deal in most cases. But even > if it takes a second or two, it's a one-time cost. What I was worried about was the planner possibly trying to apply the atttypid restriction as a scan qual using a subplan, which might be rather awful. But it doesn't look like that happens. I get a hash semijoin to the CTE output, in all branches back to 8.4, on this trial query: explain with recursive sqlidoids(toid) as ( select 'information_schema.sql_identifier'::pg_catalog.regtype as toid union select oid from pg_catalog.pg_type, sqlidoids where typtype = 'd' and typbasetype = sqlidoids.toid ) SELECT n.nspname, c.relname, a.attname FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n, pg_catalog.pg_attribute a WHERE c.oid = a.attrelid AND NOT a.attisdropped AND a.atttypid in (select toid from sqlidoids) AND c.relkind IN ('r','v','i') and c.relnamespace = n.oid AND n.nspname !~ '^pg_temp_' AND n.nspname !~ '^pg_toast_temp_' AND n.nspname NOT IN ('pg_catalog', 'information_schema'); regards, tom lane -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-10T20:40:22Z
On Thu, Oct 10, 2019 at 04:14:20PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Thu, Oct 10, 2019 at 10:19:12AM -0400, Tom Lane wrote: >>> To identify such domains, I think we'd need something like >>> WHERE attypid IN (recursive-WITH-query), which makes me nervous. >>> We did support those starting with 8.4, which is as far back as >>> pg_upgrade will go, so in theory it should work. But I think we >>> had bugs with such cases in old releases. Do we want to assume >>> that the source server has been updated enough to avoid any such >>> bugs? The expense of such a query might be daunting, too. > >> For the query cost, I think we can assume the domain hierarchies are not >> particularly deep (in practice I'd expect just domains directly on the >> sql_identifier type). And I doubt people are using that very widely, >> it's probably more like this report - ad-hoc CTAS, so just a couple of >> items. So I wouldn't expect it to be a huge deal in most cases. But even >> if it takes a second or two, it's a one-time cost. > >What I was worried about was the planner possibly trying to apply the >atttypid restriction as a scan qual using a subplan, which might be rather >awful. But it doesn't look like that happens. OK. > I get a hash semijoin to >the CTE output, in all branches back to 8.4, on this trial query: > >explain >with recursive sqlidoids(toid) as ( >select 'information_schema.sql_identifier'::pg_catalog.regtype as toid >union >select oid from pg_catalog.pg_type, sqlidoids > where typtype = 'd' and typbasetype = sqlidoids.toid >) >SELECT n.nspname, c.relname, a.attname >FROM pg_catalog.pg_class c, > pg_catalog.pg_namespace n, > pg_catalog.pg_attribute a >WHERE c.oid = a.attrelid AND > NOT a.attisdropped AND > a.atttypid in (select toid from sqlidoids) AND > c.relkind IN ('r','v','i') and > c.relnamespace = n.oid AND > n.nspname !~ '^pg_temp_' AND > n.nspname !~ '^pg_toast_temp_' AND > n.nspname NOT IN ('pg_catalog', 'information_schema'); > I think that's not quite sufficient - the problem is that we can have domains and composite types on sql_identifier, in some arbitrary order. And the recursive CTE won't handle that the way it's written - it will miss domains on composite types containing sql_identifier. And we have quite a few of them in the information schema, so maybe someone created a domain on one of those (however unlikely it may seem). I think this recursive CTE does it correctly: WITH RECURSIVE oids AS ( -- type itself SELECT 'information_schema.sql_identifier'::regtype AS oid UNION ALL SELECT * FROM ( -- domains on the type WITH x AS (SELECT oid FROM oids) SELECT t.oid FROM pg_catalog.pg_type t, x WHERE typbasetype = x.oid AND typtype = 'd' UNION -- composite types containing the type SELECT t.oid FROM pg_catalog.pg_type t, pg_catalog.pg_class c, pg_catalog.pg_attribute a, x WHERE t.typtype = 'c' AND t.oid = c.reltype AND c.oid = a.attrelid AND a.atttypid = x.oid ) foo ) I had to use CTE within CTE, because the 'oids' can be referenced only once, but we have two subqueries there. Maybe there's a better solution. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-13T00:10:32Z
OK, here is an updated patch, with the recursive CTE. I've done a fair amount of testing on it on older versions (up to 9.4), and it seems to work just fine. Another thing that I noticed is that the query does not need to look at RELKIND_COMPOSITE_TYPE, because we only really care about cases when sql_identifier is stored on-disk. Composite type alone does not do that, and the CTE includes OIDs of composite types that we then check against relations and matviews. Barring objections, I'll push this early next week. BTW the query (including the RELKIND_COMPSITE_TYPE) was essentially just a lightly-massaged copy of old_9_6_check_for_unknown_data_type_usage, so that seems wrong too. The comment explicitly says: * Also check composite types, in case they are used for table columns. but even a simple "create type c as (a unknown, b int)" without any table using it enough to trigger the failure. But maybe it's intentional, not sure. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-13T18:26:48Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > here is an updated patch, with the recursive CTE. I've done a fair > amount of testing on it on older versions (up to 9.4), and it seems to > work just fine. Might be a good idea to exclude attisdropped columns in the part of the recursive query that's looking for sql_identifier columns of composite types. I'm not sure if composites can have dropped columns today, but even if they can't it seems like a wise bit of future-proofing. (We'll no doubt have occasion to use this logic again...) Looks good other than that nit. > BTW the query (including the RELKIND_COMPSITE_TYPE) was essentially just > a lightly-massaged copy of old_9_6_check_for_unknown_data_type_usage, so > that seems wrong too. Yeah, we should back-port this logic into that check too, IMO. regards, tom lane
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-13T20:38:48Z
On Sun, Oct 13, 2019 at 02:26:48PM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> here is an updated patch, with the recursive CTE. I've done a fair >> amount of testing on it on older versions (up to 9.4), and it seems to >> work just fine. > >Might be a good idea to exclude attisdropped columns in the part of the >recursive query that's looking for sql_identifier columns of composite >types. I'm not sure if composites can have dropped columns today, >but even if they can't it seems like a wise bit of future-proofing. >(We'll no doubt have occasion to use this logic again...) > Hmm? How could that be safe? Let's say we have a composite type with a sql_identifier column, it's used in a table with data, and we drop the column. We need the pg_type information to parse the existing, so how could we skip attisdropped columns? >Looks good other than that nit. > >> BTW the query (including the RELKIND_COMPSITE_TYPE) was essentially just >> a lightly-massaged copy of old_9_6_check_for_unknown_data_type_usage, so >> that seems wrong too. > >Yeah, we should back-port this logic into that check too, IMO. > You mean the recursive CTE, removal of RELKIND_COMPOSITE_TYPE or the proposed change w.r.t. dropped columns? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-14T14:16:40Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Sun, Oct 13, 2019 at 02:26:48PM -0400, Tom Lane wrote: >> Might be a good idea to exclude attisdropped columns in the part of the >> recursive query that's looking for sql_identifier columns of composite >> types. I'm not sure if composites can have dropped columns today, [ I checked this, they can ] >> but even if they can't it seems like a wise bit of future-proofing. >> (We'll no doubt have occasion to use this logic again...) > Hmm? How could that be safe? Let's say we have a composite type with a > sql_identifier column, it's used in a table with data, and we drop the > column. We need the pg_type information to parse the existing, so how > could we skip attisdropped columns? It works exactly like it does for table rowtypes. regression=# create type cfoo as (f1 int, f2 int, f3 int); CREATE TYPE regression=# alter type cfoo drop attribute f2; ALTER TYPE regression=# select attname,atttypid,attisdropped,attlen,attalign from pg_attribute where attrelid = 'cfoo'::regclass; attname | atttypid | attisdropped | attlen | attalign ------------------------------+----------+--------------+--------+---------- f1 | 23 | f | 4 | i ........pg.dropped.2........ | 0 | t | 4 | i f3 | 23 | f | 4 | i (3 rows) All we need to skip over the dead data is attlen/attalign, which are preserved in pg_attribute even if the pg_type row is gone. As this example shows, you don't really *have* to check attisdropped because atttypid will be set to zero. But the latter is just a defense measure in case somebody forgets to check attisdropped; you're not supposed to forget that. regards, tom lane -
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-14T16:35:38Z
On Mon, Oct 14, 2019 at 10:16:40AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Sun, Oct 13, 2019 at 02:26:48PM -0400, Tom Lane wrote: >>> Might be a good idea to exclude attisdropped columns in the part of the >>> recursive query that's looking for sql_identifier columns of composite >>> types. I'm not sure if composites can have dropped columns today, > >[ I checked this, they can ] > >>> but even if they can't it seems like a wise bit of future-proofing. >>> (We'll no doubt have occasion to use this logic again...) > >> Hmm? How could that be safe? Let's say we have a composite type with a >> sql_identifier column, it's used in a table with data, and we drop the >> column. We need the pg_type information to parse the existing, so how >> could we skip attisdropped columns? > >It works exactly like it does for table rowtypes. > >regression=# create type cfoo as (f1 int, f2 int, f3 int); >CREATE TYPE >regression=# alter type cfoo drop attribute f2; >ALTER TYPE >regression=# select attname,atttypid,attisdropped,attlen,attalign from pg_attribute where attrelid = 'cfoo'::regclass; > attname | atttypid | attisdropped | attlen | attalign >------------------------------+----------+--------------+--------+---------- > f1 | 23 | f | 4 | i > ........pg.dropped.2........ | 0 | t | 4 | i > f3 | 23 | f | 4 | i >(3 rows) > >All we need to skip over the dead data is attlen/attalign, which are >preserved in pg_attribute even if the pg_type row is gone. > >As this example shows, you don't really *have* to check attisdropped >because atttypid will be set to zero. But the latter is just a >defense measure in case somebody forgets to check attisdropped; >you're not supposed to forget that. > Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for clarifying, I'll polish and push the fix shortly. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-15T00:18:17Z
On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote: > ... > >Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for >clarifying, I'll polish and push the fix shortly. > I've pushed and backpatched the fix. Attached are similar fixes for the existing pg_upgrade checks for pg_catalog.line and pg_catalog.unknown types, which have the same issues with composite types and domains. There are some additional details & examples in the commit messages. I've kept this in two patches primarily because of backpatching - the line fix should go back up to 9.4, the unknown is for 10. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Justin Pryzby <pryzby@telsasoft.com> — 2019-10-15T04:41:18Z
On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote: > On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote: > >... > > > >Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for > >clarifying, I'll polish and push the fix shortly. Perhaps it'd be worth creating a test for on-disk format ? Like a table with a column for each core type, which is either SELECTed from after pg_upgrade, or pg_dump output compared before and after. Justin
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-15T07:07:25Z
On Mon, Oct 14, 2019 at 11:41:18PM -0500, Justin Pryzby wrote: >On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote: >> On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote: >> >... >> > >> >Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for >> >clarifying, I'll polish and push the fix shortly. > >Perhaps it'd be worth creating a test for on-disk format ? > >Like a table with a column for each core type, which is either SELECTed from >after pg_upgrade, or pg_dump output compared before and after. > IMO that would be useful - we now have a couple of these checks for different data types (line, unknown, sql_identifier), with a couple of combinations each. And I've been looking if we do similar pg_upgrade tests, but I haven't found anything. I mean, we do pg_upgrade the cluster used for regression tests, but here we need to test a number of cases that are meant to abort the pg_upgrade. So we'd need a number of pg_upgrade runs, to test that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-16T11:33:44Z
On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote: >On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote: >>... >> >>Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for >>clarifying, I'll polish and push the fix shortly. >> > >I've pushed and backpatched the fix. Attached are similar fixes for the >existing pg_upgrade checks for pg_catalog.line and pg_catalog.unknown >types, which have the same issues with composite types and domains. > >There are some additional details & examples in the commit messages. > >I've kept this in two patches primarily because of backpatching - the >line fix should go back up to 9.4, the unknown is for 10. > I've just committed and pushed both fixes after some minor corrections. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-16T12:27:57Z
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > I've just committed and pushed both fixes after some minor corrections. Not quite right in 9.6 and before, according to crake. Looks like some issue with the CppAsString2'd constants? Did we even have CppAsString2 that far back? regards, tom lane
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tom Lane <tgl@sss.pgh.pa.us> — 2019-10-16T13:26:42Z
I wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> I've just committed and pushed both fixes after some minor corrections. > Not quite right in 9.6 and before, according to crake. Looks like > some issue with the CppAsString2'd constants? Did we even have > CppAsString2 that far back? Yeah, we did. On closer inspection I suspect that we need to #include some other file to get the RELKIND_ constants in the old branches. regards, tom lane
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-16T13:41:17Z
On Wed, Oct 16, 2019 at 03:26:42PM +0200, Tom Lane wrote: >I wrote: >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> I've just committed and pushed both fixes after some minor corrections. > >> Not quite right in 9.6 and before, according to crake. Looks like >> some issue with the CppAsString2'd constants? Did we even have >> CppAsString2 that far back? > >Yeah, we did. On closer inspection I suspect that we need to #include >some other file to get the RELKIND_ constants in the old branches. > Oh! Looking. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Tomas Vondra <tomas.vondra@2ndquadrant.com> — 2019-10-16T14:33:43Z
On Wed, Oct 16, 2019 at 03:26:42PM +0200, Tom Lane wrote: >I wrote: >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> I've just committed and pushed both fixes after some minor corrections. > >> Not quite right in 9.6 and before, according to crake. Looks like >> some issue with the CppAsString2'd constants? Did we even have >> CppAsString2 that far back? > >Yeah, we did. On closer inspection I suspect that we need to #include >some other file to get the RELKIND_ constants in the old branches. > Yeah, the pg_class.h catalog was missing on pre-10 relases. It compiled just fine, so I haven't noticed that during the backpatching :-( Fixed, let's see if the buildfarm is happy with that. regads -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-
Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12
Bruce Momjian <bruce@momjian.us> — 2019-10-23T22:08:21Z
On Tue, Oct 15, 2019 at 02:18:17AM +0200, Tomas Vondra wrote: > On Mon, Oct 14, 2019 at 06:35:38PM +0200, Tomas Vondra wrote: > > ... > > > > Aha! I forgot we copy the necessary stuff into pg_attribute. Thanks for > > clarifying, I'll polish and push the fix shortly. > > > > I've pushed and backpatched the fix. Attached are similar fixes for the > existing pg_upgrade checks for pg_catalog.line and pg_catalog.unknown > types, which have the same issues with composite types and domains. This comit added old_11_check_for_sql_identifier_data_type_usage(), but it did not use the clearer database error list format added to the master branch in commit 1634d36157. Attached is a patch to fix this, which I have committed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
-
pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2020-12-06T18:02:48Z
I'm finally returning to this 14 month old thread: (was: Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12) On Tue, Oct 15, 2019 at 09:07:25AM +0200, Tomas Vondra wrote: > On Mon, Oct 14, 2019 at 11:41:18PM -0500, Justin Pryzby wrote: > > > > Perhaps it'd be worth creating a test for on-disk format ? > > > > Like a table with a column for each core type, which is either SELECTed from > > after pg_upgrade, or pg_dump output compared before and after. > > IMO that would be useful - we now have a couple of these checks for > different data types (line, unknown, sql_identifier), with a couple of > combinations each. And I've been looking if we do similar pg_upgrade > tests, but I haven't found anything. I mean, we do pg_upgrade the > cluster used for regression tests, but here we need to test a number of > cases that are meant to abort the pg_upgrade. So we'd need a number of > pg_upgrade runs, to test that. I meant to notice if the binary format is accidentally changed again, which was what happened here: 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. I added a table to the regression tests so it's processed by pg_upgrade tests, run like: | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin I checked that if I cherry-pick 0002 to v11, and comment out old_11_check_for_sql_identifier_data_type_usage(), then pg_upgrade/test.sh detects the original problem: pg_dump: error: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613 I understand the buildfarm has its own cross-version-upgrade test, which I think would catch this on its own. These all seem to complicate use of pg_upgrade/test.sh, so 0001 is needed to allow testing upgrade from older releases. e78900afd217fa3eaa77c51e23a94c1466af421c Create by default sql/ and expected/ for output directory in pg_regress 40b132c1afbb4b1494aa8e48cc35ec98d2b90777 In the pg_upgrade test suite, don't write to src/test/regress. fc49e24fa69a15efacd5b8958115ed9c43c48f9a Make WAL segment size configurable at initdb time. c37b3d08ca6873f9d4eaf24c72a90a550970cbb8 Allow group access on PGDATA da9b580d89903fee871cf54845ffa2b26bda2e11 Refactor dir/file permissions -- Justin
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2020-12-16T17:22:23Z
On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: > I meant to notice if the binary format is accidentally changed again, which was > what happened here: > 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. > > I added a table to the regression tests so it's processed by pg_upgrade tests, > run like: > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin Per cfbot, this avoids testing ::xml (support for which may not be enabled) And also now tests oid types. I think the per-version hacks should be grouped by logical change, rather than by version. Which I've started doing here. -- Justin
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2020-12-27T19:07:29Z
On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote: > On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: > > I meant to notice if the binary format is accidentally changed again, which was > > what happened here: > > 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. > > > > I added a table to the regression tests so it's processed by pg_upgrade tests, > > run like: > > > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin > > Per cfbot, this avoids testing ::xml (support for which may not be enabled) > And also now tests oid types. > > I think the per-version hacks should be grouped by logical change, rather than > by version. Which I've started doing here. rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f -- Justin
-
Re: pg_upgrade test for binary compatibility of core data types
Peter Eisentraut <peter.eisentraut@enterprisedb.com> — 2021-01-11T14:28:08Z
On 2020-12-27 20:07, Justin Pryzby wrote: > On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote: >> On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: >>> I meant to notice if the binary format is accidentally changed again, which was >>> what happened here: >>> 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. >>> >>> I added a table to the regression tests so it's processed by pg_upgrade tests, >>> run like: >>> >>> | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin >> >> Per cfbot, this avoids testing ::xml (support for which may not be enabled) >> And also now tests oid types. >> >> I think the per-version hacks should be grouped by logical change, rather than >> by version. Which I've started doing here. > > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f I think these patches could use some in-place documentation of what they are trying to achieve and how they do it. The required information is spread over a lengthy thread. No one wants to read that. Add commit messages to the patches.
-
Re: pg_upgrade test for binary compatibility of core data types
Bruce Momjian <bruce@momjian.us> — 2021-01-11T15:21:36Z
On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote: > On 2020-12-27 20:07, Justin Pryzby wrote: > > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f > > I think these patches could use some in-place documentation of what they are > trying to achieve and how they do it. The required information is spread > over a lengthy thread. No one wants to read that. Add commit messages to > the patches. Oh, I see that now, and agree that you need to explain each item with a comment. pg_upgrade is doing some odd things, so documenting everything it does is a big win. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2021-01-12T04:13:52Z
On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote: > On 2020-12-27 20:07, Justin Pryzby wrote: > > On Wed, Dec 16, 2020 at 11:22:23AM -0600, Justin Pryzby wrote: > > > On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: > > > > I meant to notice if the binary format is accidentally changed again, which was > > > > what happened here: > > > > 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. > > > > > > > > I added a table to the regression tests so it's processed by pg_upgrade tests, > > > > run like: > > > > > > > > | time make -C src/bin/pg_upgrade check oldsrc=`pwd`/11 oldbindir=`pwd`/11/tmp_install/usr/local/pgsql/bin > > > > > > Per cfbot, this avoids testing ::xml (support for which may not be enabled) > > > And also now tests oid types. > > > > > > I think the per-version hacks should be grouped by logical change, rather than > > > by version. Which I've started doing here. > > > > rebased on 6df7a9698bb036610c1e8c6d375e1be38cb26d5f > > I think these patches could use some in-place documentation of what they are > trying to achieve and how they do it. The required information is spread > over a lengthy thread. No one wants to read that. Add commit messages to > the patches. 0001 patch fixes pg_upgrade/test.sh, which was disfunctional. Portions of the first patch were independently handled by commits 52202bb39, fa744697c, 091866724. So this is rebased on those. I guess updating this script should be a part of a beta-checklist somewhere, since I guess nobody will want to backpatch changes for testing older releases. 0002 allows detecting the information_schema problem that was introduced at: 7c15cef86 Base information_schema.sql_identifier domain on name, not varchar. +-- Create a table with different data types, to exercise binary compatibility +-- during pg_upgrade test If binary compatibility is changed I expect this will error, crash, at least return wrong data, and thereby fail tests. -- Justin On Sun, Dec 06, 2020 at 12:02:48PM -0600, Justin Pryzby wrote: > I checked that if I cherry-pick 0002 to v11, and comment out > old_11_check_for_sql_identifier_data_type_usage(), then pg_upgrade/test.sh > detects the original problem: > pg_dump: error: Error message from server: ERROR: invalid memory alloc request size 18446744073709551613 > > I understand the buildfarm has its own cross-version-upgrade test, which I > think would catch this on its own. > > These all seem to complicate use of pg_upgrade/test.sh, so 0001 is needed to > allow testing upgrade from older releases. > > e78900afd217fa3eaa77c51e23a94c1466af421c Create by default sql/ and expected/ for output directory in pg_regress > 40b132c1afbb4b1494aa8e48cc35ec98d2b90777 In the pg_upgrade test suite, don't write to src/test/regress. > fc49e24fa69a15efacd5b8958115ed9c43c48f9a Make WAL segment size configurable at initdb time. > c37b3d08ca6873f9d4eaf24c72a90a550970cbb8 Allow group access on PGDATA > da9b580d89903fee871cf54845ffa2b26bda2e11 Refactor dir/file permissions
-
Re: pg_upgrade test for binary compatibility of core data types
Bruce Momjian <bruce@momjian.us> — 2021-01-12T17:15:59Z
On Mon, Jan 11, 2021 at 10:13:52PM -0600, Justin Pryzby wrote: > On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote: > > I think these patches could use some in-place documentation of what they are > > trying to achieve and how they do it. The required information is spread > > over a lengthy thread. No one wants to read that. Add commit messages to > > the patches. > > 0001 patch fixes pg_upgrade/test.sh, which was disfunctional. > Portions of the first patch were independently handled by commits 52202bb39, > fa744697c, 091866724. So this is rebased on those. > I guess updating this script should be a part of a beta-checklist somewhere, > since I guess nobody will want to backpatch changes for testing older releases. Uh, what exactly is missing from the beta checklist? I read the patch and commit message but don't understand it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2021-01-12T17:27:53Z
On Tue, Jan 12, 2021 at 12:15:59PM -0500, Bruce Momjian wrote: > On Mon, Jan 11, 2021 at 10:13:52PM -0600, Justin Pryzby wrote: > > On Mon, Jan 11, 2021 at 03:28:08PM +0100, Peter Eisentraut wrote: > > > I think these patches could use some in-place documentation of what they are > > > trying to achieve and how they do it. The required information is spread > > > over a lengthy thread. No one wants to read that. Add commit messages to > > > the patches. > > > > 0001 patch fixes pg_upgrade/test.sh, which was disfunctional. > > Portions of the first patch were independently handled by commits 52202bb39, > > fa744697c, 091866724. So this is rebased on those. > > I guess updating this script should be a part of a beta-checklist somewhere, > > since I guess nobody will want to backpatch changes for testing older releases. > > Uh, what exactly is missing from the beta checklist? I read the patch > and commit message but don't understand it. Did you try to use test.sh to upgrade from a prior release ? Evidently it's frequently forgotten, as evidenced by all the "deferred maintenance" I had to do to allow testing the main patch (currently 0003). See also: commit 5bab1985dfc25eecf4b098145789955c0b246160 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu Jun 8 13:48:27 2017 -0400 Fix bit-rot in pg_upgrade's test.sh, and improve documentation. Doing a cross-version upgrade test with test.sh evidently hasn't been tested since circa 9.2, because the script lacked case branches for old-version servers newer than 9.1. Future-proof that a bit, and clean up breakage induced by our recent drop of V0 function call protocol (namely that oldstyle_length() isn't in the regression suite anymore). -- Justin -
Re: pg_upgrade test for binary compatibility of core data types
Bruce Momjian <bruce@momjian.us> — 2021-01-12T17:53:56Z
On Tue, Jan 12, 2021 at 11:27:53AM -0600, Justin Pryzby wrote: > On Tue, Jan 12, 2021 at 12:15:59PM -0500, Bruce Momjian wrote: > > Uh, what exactly is missing from the beta checklist? I read the patch > > and commit message but don't understand it. > > Did you try to use test.sh to upgrade from a prior release ? > > Evidently it's frequently forgotten, as evidenced by all the "deferred > maintenance" I had to do to allow testing the main patch (currently 0003). > > See also: > > commit 5bab1985dfc25eecf4b098145789955c0b246160 > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Thu Jun 8 13:48:27 2017 -0400 > > Fix bit-rot in pg_upgrade's test.sh, and improve documentation. > > Doing a cross-version upgrade test with test.sh evidently hasn't been > tested since circa 9.2, because the script lacked case branches for > old-version servers newer than 9.1. Future-proof that a bit, and > clean up breakage induced by our recent drop of V0 function call > protocol (namely that oldstyle_length() isn't in the regression > suite anymore). Oh, that is odd. I thought that was regularly run. I have my own test infrastructure that I run for every major release so I never have run the built-in one, except for make check-world. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
-
Re: pg_upgrade test for binary compatibility of core data types
Andrew Dunstan <andrew@dunslane.net> — 2021-01-12T21:44:28Z
On 1/12/21 12:53 PM, Bruce Momjian wrote: > On Tue, Jan 12, 2021 at 11:27:53AM -0600, Justin Pryzby wrote: >> On Tue, Jan 12, 2021 at 12:15:59PM -0500, Bruce Momjian wrote: >>> Uh, what exactly is missing from the beta checklist? I read the patch >>> and commit message but don't understand it. >> Did you try to use test.sh to upgrade from a prior release ? >> >> Evidently it's frequently forgotten, as evidenced by all the "deferred >> maintenance" I had to do to allow testing the main patch (currently 0003). >> >> See also: >> >> commit 5bab1985dfc25eecf4b098145789955c0b246160 >> Author: Tom Lane <tgl@sss.pgh.pa.us> >> Date: Thu Jun 8 13:48:27 2017 -0400 >> >> Fix bit-rot in pg_upgrade's test.sh, and improve documentation. >> >> Doing a cross-version upgrade test with test.sh evidently hasn't been >> tested since circa 9.2, because the script lacked case branches for >> old-version servers newer than 9.1. Future-proof that a bit, and >> clean up breakage induced by our recent drop of V0 function call >> protocol (namely that oldstyle_length() isn't in the regression >> suite anymore). > Oh, that is odd. I thought that was regularly run. I have my own test > infrastructure that I run for every major release so I never have run > the built-in one, except for make check-world. > Cross version pg_upgrade is tested regularly in the buildfarm, but not using test.sh. Instead it uses the saved data repository from a previous run of the buildfarm client for the source branch, and tries to upgrade that to the target branch. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
-
Re: pg_upgrade test for binary compatibility of core data types
Peter Eisentraut <peter.eisentraut@enterprisedb.com> — 2021-01-15T08:00:31Z
On 2021-01-12 22:44, Andrew Dunstan wrote: > Cross version pg_upgrade is tested regularly in the buildfarm, but not > using test.sh. Instead it uses the saved data repository from a previous > run of the buildfarm client for the source branch, and tries to upgrade > that to the target branch. Does it maintain a set of fixups similar to what is in test.sh? Are those two sets the same?
-
Re: pg_upgrade test for binary compatibility of core data types
Tom Lane <tgl@sss.pgh.pa.us> — 2021-03-06T20:01:43Z
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 2021-01-12 22:44, Andrew Dunstan wrote: >> Cross version pg_upgrade is tested regularly in the buildfarm, but not >> using test.sh. Instead it uses the saved data repository from a previous >> run of the buildfarm client for the source branch, and tries to upgrade >> that to the target branch. > Does it maintain a set of fixups similar to what is in test.sh? Are > those two sets the same? Responding to Peter: the first answer is yes, the second is I didn't check, but certainly Justin's patch makes them closer. I spent some time poking through this set of patches. I agree that there's problem(s) here that we need to solve, but it feels like this isn't a great way to solve them. What I see in the patchset is: v4-0001 mostly teaches test.sh about specific changes that have to be made to historic versions of the regression database to allow them to be reloaded into current servers. As already discussed, this is really duplicative of knowledge that's been embedded into the buildfarm client over time. It'd be better if we could refactor that so that the buildfarm shares a common database of these actions with test.sh. And said database ought to be in our git tree, so committers could fix problems without having to get Andrew involved every time. I think this could be represented as a psql script, at least in versions that have psql \if (but that came in in v10, so maybe we're there already). (Taking a step back, maybe the regression database isn't an ideal testbed for this in the first place. But it does have the advantage of not being a narrow-minded test that is going to miss things we haven't explicitly thought of.) v4-0002 is a bunch of random changes that mostly seem to revert hacky adjustments previously made to improve test coverage. I don't really agree with any of these, nor see why they're necessary. If they are necessary then we need to restore the coverage somewhere else. Admittedly, the previous changes were a bit hacky, but deleting them (without even bothering to adjust the relevant comments) isn't the answer. v4-0003 is really the heart of the matter: it adds a table with some previously-not-covered datatypes plus a query that purports to make sure that we are covering all types of interest. But I'm not sure I believe that query. It's got hard-wired assumptions about which typtype values need to be covered. Why is it okay to exclude range and multirange? Are we sure that all composites are okay to exclude? Likewise, the restriction to pg_catalog and information_schema schemas seems likely to bite us someday. There are some very random exclusions based on name patterns, which seem unsafe (let's list the specific type OIDs), and again the nearby comments don't match the code. But the biggest issue is that this can only cover core datatypes, not any contrib stuff. I don't know what we could do about contrib types. Maybe we should figure that covering core types is already a step forward, and be happy with getting that done. regards, tom lane
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2021-04-30T18:33:48Z
On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > > On 2021-01-12 22:44, Andrew Dunstan wrote: > >> Cross version pg_upgrade is tested regularly in the buildfarm, but not > >> using test.sh. Instead it uses the saved data repository from a previous > >> run of the buildfarm client for the source branch, and tries to upgrade > >> that to the target branch. > > > Does it maintain a set of fixups similar to what is in test.sh? Are > > those two sets the same? > > Responding to Peter: the first answer is yes, the second is I didn't > check, but certainly Justin's patch makes them closer. Right - I had meant to send this. https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm $opsql = 'drop operator if exists public.=> (bigint, NONE)'; .. my $missing_funcs = q{drop function if exists public.boxarea(box); drop function if exists public.funny_dup17(); .. my $prstmt = join(';', 'drop operator if exists #@# (bigint,NONE)', 'drop operator if exists #%# (bigint,NONE)', 'drop operator if exists !=- (bigint,NONE)', .. $prstmt = join(';', 'drop operator @#@ (NONE, bigint)', .. 'drop aggregate if exists public.array_cat_accum(anyarray)', > I spent some time poking through this set of patches. I agree that > there's problem(s) here that we need to solve, but it feels like this > isn't a great way to solve them. What I see in the patchset is: For starters, is there a "release beta checklist" ? Testing test.sh should be on it. So should fuzz testing. > v4-0001 mostly teaches test.sh about specific changes that have to be > made to historic versions of the regression database to allow them > to be reloaded into current servers. As already discussed, this is > really duplicative of knowledge that's been embedded into the buildfarm > client over time. It'd be better if we could refactor that so that > the buildfarm shares a common database of these actions with test.sh. > And said database ought to be in our git tree, so committers could > fix problems without having to get Andrew involved every time. > I think this could be represented as a psql script, at least in > versions that have psql \if (but that came in in v10, so maybe > we're there already). I started this. I don't know if it's compatible with the buildfarm client, but I think any issues maybe can be avoided by using "IF EXISTS". > v4-0002 is a bunch of random changes that mostly seem to revert hacky > adjustments previously made to improve test coverage. I don't really > agree with any of these, nor see why they're necessary. If they > are necessary then we need to restore the coverage somewhere else. > Admittedly, the previous changes were a bit hacky, but deleting them > (without even bothering to adjust the relevant comments) isn't the > answer. It was necessary to avoid --wal-segsize and -g to allow testing upgrades from versions which don't support those options. I think test.sh should be portable back to all supported versions. When those options were added, it broke test.sh upgrading from old versions. I changed this to a shell conditional for the "new" features: | "$1" -N -A trust ${oldsrc:+--wal-segsize 1 -g} Ideally it would check the version. > v4-0003 is really the heart of the matter: it adds a table with some > previously-not-covered datatypes plus a query that purports to make sure > that we are covering all types of interest. Actually the 'manytypes' table intends to include *all* core datatypes itself, not just those that aren't included somewhere else. I think "included somewhere else" depends on the order of the regression these, and type_sanity runs early, so the table might need to include many types that are created later, to avoid "false positives" in the associated test. > But I'm not sure I believe > that query. It's got hard-wired assumptions about which typtype values > need to be covered. Why is it okay to exclude range and multirange? > Are we sure that all composites are okay to exclude? Likewise, the > restriction to pg_catalog and information_schema schemas seems likely to > bite us someday. There are some very random exclusions based on name > patterns, which seem unsafe (let's list the specific type OIDs), and > again the nearby comments don't match the code. But the biggest issue > is that this can only cover core datatypes, not any contrib stuff. I changed to use regtype/OIDs, included range/multirange and stopped including only pg_catalog/information_schema. But didn't yet handle composites. > I don't know what we could do about contrib types. Maybe we should > figure that covering core types is already a step forward, and be > happy with getting that done. Right .. this is meant to at least handle the lowest hanging fruit. -- Justin -
Re: pg_upgrade test for binary compatibility of core data types
Jacob Champion <pchampion@vmware.com> — 2021-07-16T16:21:07Z
On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote: > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote: > > v4-0001 mostly teaches test.sh about specific changes that have to be > > made to historic versions of the regression database to allow them > > to be reloaded into current servers. As already discussed, this is > > really duplicative of knowledge that's been embedded into the buildfarm > > client over time. It'd be better if we could refactor that so that > > the buildfarm shares a common database of these actions with test.sh. > > And said database ought to be in our git tree, so committers could > > fix problems without having to get Andrew involved every time. > > I think this could be represented as a psql script, at least in > > versions that have psql \if (but that came in in v10, so maybe > > we're there already). > > I started this. I don't know if it's compatible with the buildfarm client, but > I think any issues maybe can be avoided by using "IF EXISTS". I'm going to try pulling this into a psql script today and see how far I get. > > But I'm not sure I believe > > that query. It's got hard-wired assumptions about which typtype values > > need to be covered. Why is it okay to exclude range and multirange? > > Are we sure that all composites are okay to exclude? Likewise, the > > restriction to pg_catalog and information_schema schemas seems likely to > > bite us someday. There are some very random exclusions based on name > > patterns, which seem unsafe (let's list the specific type OIDs), and > > again the nearby comments don't match the code. But the biggest issue > > is that this can only cover core datatypes, not any contrib stuff. > > I changed to use regtype/OIDs, included range/multirange and stopped including > only pg_catalog/information_schema. But didn't yet handle composites. Per cfbot, this test needs to be taught about the new pg_brin_bloom_summary and pg_brin_minmax_multi_summary types. --Jacob
-
Re: pg_upgrade test for binary compatibility of core data types
Jacob Champion <pchampion@vmware.com> — 2021-07-16T16:40:34Z
On Fri, 2021-07-16 at 16:21 +0000, Jacob Champion wrote: > On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote: > > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote: > > > v4-0001 mostly teaches test.sh about specific changes that have to be > > > made to historic versions of the regression database to allow them > > > to be reloaded into current servers. As already discussed, this is > > > really duplicative of knowledge that's been embedded into the buildfarm > > > client over time. It'd be better if we could refactor that so that > > > the buildfarm shares a common database of these actions with test.sh. > > > And said database ought to be in our git tree, so committers could > > > fix problems without having to get Andrew involved every time. > > > I think this could be represented as a psql script, at least in > > > versions that have psql \if (but that came in in v10, so maybe > > > we're there already). > > > > I started this. I don't know if it's compatible with the buildfarm client, but > > I think any issues maybe can be avoided by using "IF EXISTS". > > I'm going to try pulling this into a psql script today and see how far > I get. I completely misread this exchange -- you already did this in 0004. Sorry for the noise. --Jacob
-
Re: pg_upgrade test for binary compatibility of core data types
Jacob Champion <pchampion@vmware.com> — 2021-07-16T18:02:18Z
On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote: > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote: > > v4-0001 mostly teaches test.sh about specific changes that have to be > > made to historic versions of the regression database to allow them > > to be reloaded into current servers. As already discussed, this is > > really duplicative of knowledge that's been embedded into the buildfarm > > client over time. It'd be better if we could refactor that so that > > the buildfarm shares a common database of these actions with test.sh. > > And said database ought to be in our git tree, so committers could > > fix problems without having to get Andrew involved every time. > > I think this could be represented as a psql script, at least in > > versions that have psql \if (but that came in in v10, so maybe > > we're there already). > > I started this. I don't know if it's compatible with the buildfarm client, but > I think any issues maybe can be avoided by using "IF EXISTS". Here are the differences I see on a first pass (without putting too much thought into how significant the differences are). Buildfarm code I'm comparing against is at [1]. - Both versions drop @#@ and array_cat_accum, but the buildfarm additionally replaces them with a new operator and aggregate, respectively. - The buildfarm's dropping of table OIDs is probably more resilient, since it loops over pg_class looking for relhasoids. - The buildfarm handles (or drops) several contrib databases in addition to the core regression DB. - The psql script drops the first_el_agg_any aggregate and a `TRANSFORM FOR integer`; I don't see any corresponding code in the buildfarm. - Some version ranges are different between the two. For example, abstime_/reltime_/tinterval_tbl are dropped by the buildfarm if the old version is < 9.3, while the psql script drops them for old versions <= 10. - The buildfarm drops the public.=> operator for much older versions of Postgres. I assume we don't need that here. - The buildfarm adjusts pg_proc for the location of regress.so; I see there's a commented placeholder for this at the end of the psql script but it's not yet implemented. As an aside, I think the "fromv10" naming scheme for the "old version <= 10" condition is unintuitive. If the old version is e.g. 9.6, we're not upgrading "from 10". --Jacob [1] https://github.com/PGBuildFarm/client-code/blob/main/PGBuild/Modules/TestUpgradeXversion.pm
-
Re: pg_upgrade test for binary compatibility of core data types
Tom Lane <tgl@sss.pgh.pa.us> — 2021-09-11T18:19:04Z
Jacob Champion <pchampion@vmware.com> writes: > On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote: >> I started this. I don't know if it's compatible with the buildfarm client, but >> I think any issues maybe can be avoided by using "IF EXISTS". > Here are the differences I see on a first pass (without putting too > much thought into how significant the differences are). Buildfarm code > I'm comparing against is at [1]. I switched the CF entry for this to "Waiting on Author". It's been failing in the cfbot for a couple of months, and Jacob's provided some review-ish comments here, so I think there's plenty of reason to deem the ball to be in Justin's court. regards, tom lane
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2021-09-12T00:51:16Z
On Fri, Jul 16, 2021 at 06:02:18PM +0000, Jacob Champion wrote: > On Fri, 2021-04-30 at 13:33 -0500, Justin Pryzby wrote: > > On Sat, Mar 06, 2021 at 03:01:43PM -0500, Tom Lane wrote: > > > v4-0001 mostly teaches test.sh about specific changes that have to be > > > made to historic versions of the regression database to allow them > > > to be reloaded into current servers. As already discussed, this is > > > really duplicative of knowledge that's been embedded into the buildfarm > > > client over time. It'd be better if we could refactor that so that > > > the buildfarm shares a common database of these actions with test.sh. > > > And said database ought to be in our git tree, so committers could > > > fix problems without having to get Andrew involved every time. > > > I think this could be represented as a psql script, at least in > > > versions that have psql \if (but that came in in v10, so maybe > > > we're there already). > > > > I started this. I don't know if it's compatible with the buildfarm client, but > > I think any issues maybe can be avoided by using "IF EXISTS". > > Here are the differences I see on a first pass (without putting too > much thought into how significant the differences are). Buildfarm code > I'm comparing against is at [1]. > > - Both versions drop @#@ and array_cat_accum, but the buildfarm > additionally replaces them with a new operator and aggregate, > respectively. > > - The buildfarm's dropping of table OIDs is probably more resilient, > since it loops over pg_class looking for relhasoids. These are all "translated" from test.sh, so follow its logic. Maybe it should be improved, but that's separate from this patch - which is already doing a few unrelated things. > - The buildfarm adjusts pg_proc for the location of regress.so; I see > there's a commented placeholder for this at the end of the psql script > but it's not yet implemented. I didn't understand why this was done here, but it turns out it has to be done *after* calling pg_dump. So it has to stay where it is. > - Some version ranges are different between the two. For example, > abstime_/reltime_/tinterval_tbl are dropped by the buildfarm if the old > version is < 9.3, while the psql script drops them for old versions <= > 10. This was an error. Thanks. > - The buildfarm drops the public.=> operator for much older versions of > Postgres. I assume we don't need that here. > As an aside, I think the "fromv10" naming scheme for the "old version > <= 10" condition is unintuitive. If the old version is e.g. 9.6, we're > not upgrading "from 10". I renamed the version vars - feel free to suggest something better. I'll solicit suggestions what else to do to progresss these. @Andrew: did you have any comment on this part ? |Subject: buildfarm xversion diff |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com | |I gave suggestion how to reduce the "lines of diff" metric almost to nothing, |allowing a very small "fudge factor", and which I think makes this a pretty |good metric rather than a passable one. -- Justin
-
Re: pg_upgrade test for binary compatibility of core data types
Andrew Dunstan <andrew@dunslane.net> — 2021-09-12T18:41:23Z
On 9/11/21 8:51 PM, Justin Pryzby wrote: > > @Andrew: did you have any comment on this part ? > > |Subject: buildfarm xversion diff > |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com > | > |I gave suggestion how to reduce the "lines of diff" metric almost to nothing, > |allowing a very small "fudge factor", and which I think makes this a pretty > |good metric rather than a passable one. > Somehow I missed that. Looks like some good suggestions. I'll experiment. (Note: we can't assume the presence of sed, especially on Windows). cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
-
Re: pg_upgrade test for binary compatibility of core data types
Andrew Dunstan <andrew@dunslane.net> — 2021-09-13T13:20:37Z
On 9/12/21 2:41 PM, Andrew Dunstan wrote: > On 9/11/21 8:51 PM, Justin Pryzby wrote: >> @Andrew: did you have any comment on this part ? >> >> |Subject: buildfarm xversion diff >> |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com >> | >> |I gave suggestion how to reduce the "lines of diff" metric almost to nothing, >> |allowing a very small "fudge factor", and which I think makes this a pretty >> |good metric rather than a passable one. >> > Somehow I missed that. Looks like some good suggestions. I'll > experiment. (Note: we can't assume the presence of sed, especially on > Windows). > > I tried with the attached patch on crake, which tests back as far as 9.2. Here are the diff counts from HEAD: andrew@emma:HEAD $ grep -c '^[+-]' dumpdiff-REL9_* dumpdiff-REL_1* dumpdiff-HEAD dumpdiff-REL9_2_STABLE:514 dumpdiff-REL9_3_STABLE:169 dumpdiff-REL9_4_STABLE:185 dumpdiff-REL9_5_STABLE:221 dumpdiff-REL9_6_STABLE:11 dumpdiff-REL_10_STABLE:11 dumpdiff-REL_11_STABLE:73 dumpdiff-REL_12_STABLE:73 dumpdiff-REL_13_STABLE:73 dumpdiff-REL_14_STABLE:0 dumpdiff-HEAD:0 I've also attached those non-empty dumpdiff files for information, since they are quite small. There is still work to do, but this is promising. Next step: try it on Windows. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
-
Re: pg_upgrade test for binary compatibility of core data types
Andrew Dunstan <andrew@dunslane.net> — 2021-09-15T19:28:54Z
On 9/13/21 9:20 AM, Andrew Dunstan wrote: > On 9/12/21 2:41 PM, Andrew Dunstan wrote: >> On 9/11/21 8:51 PM, Justin Pryzby wrote: >>> @Andrew: did you have any comment on this part ? >>> >>> |Subject: buildfarm xversion diff >>> |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com >>> | >>> |I gave suggestion how to reduce the "lines of diff" metric almost to nothing, >>> |allowing a very small "fudge factor", and which I think makes this a pretty >>> |good metric rather than a passable one. >>> >> Somehow I missed that. Looks like some good suggestions. I'll >> experiment. (Note: we can't assume the presence of sed, especially on >> Windows). >> >> > I tried with the attached patch on crake, which tests back as far as > 9.2. Here are the diff counts from HEAD: > > > andrew@emma:HEAD $ grep -c '^[+-]' dumpdiff-REL9_* dumpdiff-REL_1* > dumpdiff-HEAD > dumpdiff-REL9_2_STABLE:514 > dumpdiff-REL9_3_STABLE:169 > dumpdiff-REL9_4_STABLE:185 > dumpdiff-REL9_5_STABLE:221 > dumpdiff-REL9_6_STABLE:11 > dumpdiff-REL_10_STABLE:11 > dumpdiff-REL_11_STABLE:73 > dumpdiff-REL_12_STABLE:73 > dumpdiff-REL_13_STABLE:73 > dumpdiff-REL_14_STABLE:0 > dumpdiff-HEAD:0 > > > I've also attached those non-empty dumpdiff files for information, since > they are quite small. > > > There is still work to do, but this is promising. Next step: try it on > Windows. > > It appears to do the right thing on Windows. yay! We probably need to get smarter about the heuristics, though, e.g. by taking into account the buildfarm options and the platform. It would also help a lot if we could make vcregress.pl honor USE_MODULE_DB. That's on my TODO list, but it just got a lot higher priority. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
-
Re: pg_upgrade test for binary compatibility of core data types
Andrew Dunstan <andrew@dunslane.net> — 2021-09-24T14:58:52Z
On 9/15/21 3:28 PM, Andrew Dunstan wrote: > On 9/13/21 9:20 AM, Andrew Dunstan wrote: >> On 9/12/21 2:41 PM, Andrew Dunstan wrote: >>> On 9/11/21 8:51 PM, Justin Pryzby wrote: >>>> @Andrew: did you have any comment on this part ? >>>> >>>> |Subject: buildfarm xversion diff >>>> |Forking https://www.postgresql.org/message-id/20210328231433.GI15100@telsasoft.com >>>> | >>>> |I gave suggestion how to reduce the "lines of diff" metric almost to nothing, >>>> |allowing a very small "fudge factor", and which I think makes this a pretty >>>> |good metric rather than a passable one. >>>> >>> Somehow I missed that. Looks like some good suggestions. I'll >>> experiment. (Note: we can't assume the presence of sed, especially on >>> Windows). >>> >>> >> I tried with the attached patch on crake, which tests back as far as >> 9.2. Here are the diff counts from HEAD: >> >> >> andrew@emma:HEAD $ grep -c '^[+-]' dumpdiff-REL9_* dumpdiff-REL_1* >> dumpdiff-HEAD >> dumpdiff-REL9_2_STABLE:514 >> dumpdiff-REL9_3_STABLE:169 >> dumpdiff-REL9_4_STABLE:185 >> dumpdiff-REL9_5_STABLE:221 >> dumpdiff-REL9_6_STABLE:11 >> dumpdiff-REL_10_STABLE:11 >> dumpdiff-REL_11_STABLE:73 >> dumpdiff-REL_12_STABLE:73 >> dumpdiff-REL_13_STABLE:73 >> dumpdiff-REL_14_STABLE:0 >> dumpdiff-HEAD:0 >> >> >> I've also attached those non-empty dumpdiff files for information, since >> they are quite small. >> >> >> There is still work to do, but this is promising. Next step: try it on >> Windows. >> >> > It appears to do the right thing on Windows. yay! > > > We probably need to get smarter about the heuristics, though, e.g. by > taking into account the buildfarm options and the platform. It would > also help a lot if we could make vcregress.pl honor USE_MODULE_DB. > That's on my TODO list, but it just got a lot higher priority. > > Here's what I've committed: <https://github.com/PGBuildFarm/client-code/commit/6317d82c0e897a29dabd57ed8159d13920401f96> cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-10-01T07:58:41Z
On Sat, Sep 11, 2021 at 07:51:16PM -0500, Justin Pryzby wrote: > These are all "translated" from test.sh, so follow its logic. > Maybe it should be improved, but that's separate from this patch - which is > already doing a few unrelated things. I was looking at this CF entry, and what you are doing in 0004 to move the tweaks from pg_upgrade's test.sh to a separate SQL script that uses psql's meta-commands like \if to check which version we are on is really interesting. The patch does not apply anymore, so this needs a rebase. The entry has been switched as waiting on author by Tom, but you did not update it after sending the new versions in [1]. I am wondering if we could have something cleaner than just a set booleans as you do here for each check, as that does not help with the readability of the tests. [1]: https://www.postgresql.org/message-id/20210912005116.GF26465@telsasoft.com -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-10-11T05:38:12Z
On Fri, Oct 01, 2021 at 04:58:41PM +0900, Michael Paquier wrote: > I was looking at this CF entry, and what you are doing in 0004 to move > the tweaks from pg_upgrade's test.sh to a separate SQL script that > uses psql's meta-commands like \if to check which version we are on is > really interesting. The patch does not apply anymore, so this needs a > rebase. The entry has been switched as waiting on author by Tom, but > you did not update it after sending the new versions in [1]. I am > wondering if we could have something cleaner than just a set booleans > as you do here for each check, as that does not help with the > readability of the tests. And so, I am back at this thread, looking at the set of patches proposed from 0001 to 0004. The patches are rather messy and mix many things and concepts, but there are basically four things that stand out: - test.sh is completely broken when using PG >= 14 as new version because of the removal of the test tablespace. Older versions of pg_regress don't support --make-tablespacedir so I am fine to stick a couple of extra mkdirs for testtablespace/, expected/ and sql/ to allow the script to work properly for major upgrades as a workaround, but only if we use an old version. We need to do something here for HEAD and REL_14_STABLE. - The script would fail when using PG <= 11 as old version because of WITH OIDS relations. We need to do something down to REL_12_STABLE. I did not like much the approach taken to stick 4 ALTER TABLE queries though (the patch was actually failing here for me), so instead I have borrowed what the buildfarm has been doing with a DO block. That works fine, and that's more portable. - Not using --extra-float-digits with PG <= 11 as older version causes a bunch of diffs in the dumps, making the whole unreadable. The patch was doing that unconditionally for *all version*, which is not good. We should only do that on the versions that need it, and we know the old version number before taking any dumps so that's easy to check. - The addition of --wal-segsize and --allow-group-access breaks the script when using PG < 10 at initdb time as these got added in 11. With 10 getting EOL'd next year and per the lack of complaints, I am not excited to do anything here and I'd rather leave this out so as we keep coverage for those options across *all* major versions upgraded from 11~. The buildfarm has tests down to 9.2, but for devs my take is that this is enough for now. This is for the basics in terms of fixing test.sh and what should be backpatched. In this aspect patches 0001 and 0002 were a bit incorrect. I am not sure that 0003 is that interesting as designed as we would miss any new core types introduced. 0004 is something I'd like to get done on HEAD to ease the move of the pg_upgrade tests to TAP, but it could be made a bit easier to read by not having all those oldpgversion_XX_YY flags grouped together for one. So I am going to rewrite portions of it once done with the above. For now, attached is a patch to address the issues with test.sh that I am planning to backpatch. This fixes the facility on HEAD, while minimizing the diffs between the dumps. We could do more, like a s/PROCEDURE/FUNCTION/ but that does not make the diffs really unreadable either. I have only tested that on HEAD as new version down to 11 as the oldest version per the business with --wal-segsize. This still needs tests with 12~ as new version though, which is boring but not complicated at all :) -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-10-13T01:36:08Z
On Mon, Oct 11, 2021 at 02:38:12PM +0900, Michael Paquier wrote: > For now, attached is a patch to address the issues with test.sh that I > am planning to backpatch. This fixes the facility on HEAD, while > minimizing the diffs between the dumps. We could do more, like a > s/PROCEDURE/FUNCTION/ but that does not make the diffs really > unreadable either. I have only tested that on HEAD as new version > down to 11 as the oldest version per the business with --wal-segsize. > This still needs tests with 12~ as new version though, which is boring > but not complicated at all :) Okay, tested and done as of fa66b6d. -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2021-11-07T19:22:00Z
On Mon, Oct 11, 2021 at 02:38:12PM +0900, Michael Paquier wrote: > On Fri, Oct 01, 2021 at 04:58:41PM +0900, Michael Paquier wrote: > > I was looking at this CF entry, and what you are doing in 0004 to move > > the tweaks from pg_upgrade's test.sh to a separate SQL script that > > uses psql's meta-commands like \if to check which version we are on is > > really interesting. The patch does not apply anymore, so this needs a > > rebase. The entry has been switched as waiting on author by Tom, but > > you did not update it after sending the new versions in [1]. I am > > wondering if we could have something cleaner than just a set booleans > > as you do here for each check, as that does not help with the > > readability of the tests. > > And so, I am back at this thread, looking at the set of patches > proposed from 0001 to 0004. The patches are rather messy and mix many > things and concepts, but there are basically four things that stand > out: > - test.sh is completely broken when using PG >= 14 as new version > because of the removal of the test tablespace. Older versions of > pg_regress don't support --make-tablespacedir so I am fine to stick a > couple of extra mkdirs for testtablespace/, expected/ and sql/ to > allow the script to work properly for major upgrades as a workaround, > but only if we use an old version. We need to do something here for > HEAD and REL_14_STABLE. > - The script would fail when using PG <= 11 as old version because of > WITH OIDS relations. We need to do something down to REL_12_STABLE. > I did not like much the approach taken to stick 4 ALTER TABLE queries > though (the patch was actually failing here for me), so instead I have > borrowed what the buildfarm has been doing with a DO block. That > works fine, and that's more portable. > - Not using --extra-float-digits with PG <= 11 as older version causes > a bunch of diffs in the dumps, making the whole unreadable. The patch > was doing that unconditionally for *all version*, which is not good. > We should only do that on the versions that need it, and we know the > old version number before taking any dumps so that's easy to check. > - The addition of --wal-segsize and --allow-group-access breaks the > script when using PG < 10 at initdb time as these got added in 11. > With 10 getting EOL'd next year and per the lack of complaints, I am > not excited to do anything here and I'd rather leave this out so as we > keep coverage for those options across *all* major versions upgraded > from 11~. The buildfarm has tests down to 9.2, but for devs my take > is that this is enough for now. Michael handled those in fa66b6d. Note that the patch assumes that the "old version" being pg_upgraded has commit 97f73a978: "Work around cross-version-upgrade issues created by commit 9e38c2bb5." That may be good enough for test.sh, but if the kludges were moved to a .sql script which was also run by the buildfarm (in stead of its hardcoded kludges), then it might be necessary to handle the additional stuff my patch did, like: + DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;" + DROP FUNCTION boxarea(box);" + DROP FUNCTION funny_dup17();" + DROP TABLE abstime_tbl;" + DROP TABLE reltime_tbl;" + DROP TABLE tinterval_tbl;" + DROP AGGREGATE first_el_agg_any(anyelement);" + DROP AGGREGATE array_cat_accum(anyarray);" + DROP OPERATOR @#@(NONE,bigint);" Or, maybe it's guaranteed that the animals all run latest version of old branches, in which case I think some of the BF's existing logic could be dropped, which would help to reconcile these two scripts: my $missing_funcs = q{drop function if exists public.boxarea(box); drop function if exists public.funny_dup17(); .. $prstmt = join(';', 'drop operator @#@ (NONE, bigint)', .. 'drop aggregate if exists public.array_cat_accum(anyarray)', > This is for the basics in terms of fixing test.sh and what should be > backpatched. In this aspect patches 0001 and 0002 were a bit > incorrect. I am not sure that 0003 is that interesting as designed as > we would miss any new core types introduced. We wouldn't miss new core types, because of the 2nd part of type_sanity which tests that each core type was included in the "manytypes" table. +-- And now a test on the previous test, checking that all core types are +-- included in this table +-- XXX or some other non-catalog table processed by pg_upgrade +SELECT oid, typname, typtype, typelem, typarray, typarray FROM pg_type t +WHERE typtype NOT IN ('p', 'c') +-- reg* which cannot be pg_upgraded +AND oid != ALL(ARRAY['regproc', 'regprocedure', 'regoper', 'regoperator', 'regconfig', 'regdictionary', 'regnamespace', 'regcollation']::regtype[]) +-- XML might be disabled at compile-time +AND oid != ALL(ARRAY['xml', 'gtsvector', 'pg_node_tree', 'pg_ndistinct', 'pg_dependencies', 'pg_mcv_list', 'pg_brin_bloom_summary', 'pg_brin_minmax_multi_summary']::regtype[]) +AND NOT EXISTS (SELECT 1 FROM pg_type u WHERE u.typarray=t.oid) -- exclude arrays +AND NOT EXISTS (SELECT 1 FROM pg_attribute a WHERE a.atttypid=t.oid AND a.attnum>0 AND a.attrelid='manytypes'::regclass); > 0004 is something I'd like to get done on HEAD to ease the move of the > pg_upgrade tests to TAP, but it could be made a bit easier to read by > not having all those oldpgversion_XX_YY flags grouped together for > one. So I am going to rewrite portions of it once done with the > above. -- Justin -
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-11-08T03:53:58Z
On Sun, Nov 07, 2021 at 01:22:00PM -0600, Justin Pryzby wrote: > That may be good enough for test.sh, but if the kludges were moved to a .sql > script which was also run by the buildfarm (in stead of its hardcoded kludges), then > it might be necessary to handle the additional stuff my patch did, like: > [...] > > Or, maybe it's guaranteed that the animals all run latest version of old > branches, in which case I think some of the BF's existing logic could be > dropped, which would help to reconcile these two scripts: I am pretty sure that it is safe to assume that all buildfarm animals run the top of the stable branch they are testing, at least on the community side. An advantage of moving all those SQLs to a script that can be process with psql thanks to the \if metacommands you have added is that buildfarm clients are not required to immediately update their code to work properly. Considering as well that we should minimize the amount of duplication between all those things, I'd like to think that we'd better apply 0002 and consider a backpatch to allow the buildfarm to catch up on it. It should at least allow to remove a good chunk of the object cleanup done directly by the buildfarm. >> This is for the basics in terms of fixing test.sh and what should be >> backpatched. In this aspect patches 0001 and 0002 were a bit >> incorrect. I am not sure that 0003 is that interesting as designed as >> we would miss any new core types introduced. > > We wouldn't miss new core types, because of the 2nd part of type_sanity which > tests that each core type was included in the "manytypes" table. +-- XML might be disabled at compile-time +AND oid != ALL(ARRAY['xml', 'gtsvector', 'pg_node_tree', 'pg_ndistinct', 'pg_dependencies', 'pg_mcv_list', 'pg_brin_bloom_summary', 'pg_brin_minmax_multi_summary']::regtype[]) I believe that this comment is incomplete, applying only to the first element listed in this array. I guess that this had better document why those catalogs are part of the list? Good to see that adding a reg* in core would immediately be noticed though, as far as I understand this SQL. -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-11-17T07:01:19Z
On Sun, Nov 07, 2021 at 01:22:00PM -0600, Justin Pryzby wrote: > That may be good enough for test.sh, but if the kludges were moved to a .sql > script which was also run by the buildfarm (in stead of its hardcoded kludges), then > it might be necessary to handle the additional stuff my patch did, like: > > + DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;" > + DROP FUNCTION boxarea(box);" > + DROP FUNCTION funny_dup17();" These apply for an old version <= v10. > + DROP TABLE abstime_tbl;" > + DROP TABLE reltime_tbl;" > + DROP TABLE tinterval_tbl;" old version <= 9.3. > + DROP AGGREGATE first_el_agg_any(anyelement);" Not sure about this one. > + DROP AGGREGATE array_cat_accum(anyarray);" > + DROP OPERATOR @#@(NONE,bigint);" These are on 9.4. It is worth noting that TestUpgradeXversion.pm recreates those objects. I'd agree to close the gap completely rather than just moving what test.sh does to wipe out a maximum client code for the buildfarm. > Or, maybe it's guaranteed that the animals all run latest version of old > branches, in which case I think some of the BF's existing logic could be > dropped, which would help to reconcile these two scripts: That seems like a worthy goal to reduce the amount of duplication with the buildfarm code, while allowing tests from upgrades with older versions (the WAL segment size and group permission issue in test.sh had better be addressed in a better way, perhaps once the pg_upgrade tests are moved to TAP). There are also things specific to contrib/ modules with older versions, but that may be too specific for this exercise. +\if :oldpgversion_le84 +DROP FUNCTION public.myfunc(integer); +\endif The oldest version tested by the buildfarm is 9.2, so we could ignore this part I guess? Andrew, what do you think about this part? Based on my read of this thread, there is an agreement that this approach makes the buildfarm code more manageable so as committers would not need to patch the buildfarm code if their test fail. I agree with this conclusion, but I wanted to double-check with you first. This would need a backpatch down to 10 so as we could clean up a maximum of code in TestUpgradeXversion.pm without waiting for an extra 5 years. Please note that I am fine to send a patch for the buildfarm client. > We wouldn't miss new core types, because of the 2nd part of type_sanity which > tests that each core type was included in the "manytypes" table. Thanks, I see your point now after a closer read. There is still a pending question for contrib modules, but I think that we need to think larger here with a better integration of contrib/ modules in the upgrade testing process. Making that cheap would require running the set of regression tests on the instance to-be-upgraded first. I think that one step in this direction would be to have unique databases for each contrib/ modules, so as there is no overlap with objects dropped? Having some checks with code types looks fine as a first step, so let's do that. I have reviewed 0001, rewrote a couple of comments. All the comments from upthread seem to be covered with that. So I'd like to get that applied on HEAD. We could as well be less conservative and backpatch that down to 12 to follow on 7c15cef so we would be more careful with 15~ already (a backpatch down to 14 would be enough for this purpose, actually thanks to the 14->15 upgrade path). -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Andrew Dunstan <andrew@dunslane.net> — 2021-11-17T15:07:17Z
On 11/17/21 02:01, Michael Paquier wrote: > > The oldest version tested by the buildfarm is 9.2, so we could ignore > this part I guess? > > Andrew, what do you think about this part? Based on my read of this > thread, there is an agreement that this approach makes the buildfarm > code more manageable so as committers would not need to patch the > buildfarm code if their test fail. I agree with this conclusion, but > I wanted to double-check with you first. This would need a backpatch > down to 10 so as we could clean up a maximum of code in > TestUpgradeXversion.pm without waiting for an extra 5 years. Please > note that I am fine to send a patch for the buildfarm client. > > In general I'm in agreement with the direction here. If we can have a script that applies to back branches to make them suitable for upgrade testing instead of embedding this in the buildfarm client, so much the better. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-11-18T04:36:50Z
On Wed, Nov 17, 2021 at 10:07:17AM -0500, Andrew Dunstan wrote: > In general I'm in agreement with the direction here. If we can have a > script that applies to back branches to make them suitable for upgrade > testing instead of embedding this in the buildfarm client, so much the > better. Okay. I have worked on 0001 to add the table to check after the binary compatibilities and applied it. What remains on this thread is 0002 to move all the SQL queries into a psql-able file with the set of \if clauses to control which query is run depending on the backend version. Justin, could you send a rebased version of that with all the changes from the buildfarm client included? -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Justin Pryzby <pryzby@telsasoft.com> — 2021-11-18T04:47:28Z
On Wed, Nov 17, 2021 at 04:01:19PM +0900, Michael Paquier wrote: > On Sun, Nov 07, 2021 at 01:22:00PM -0600, Justin Pryzby wrote: > > That may be good enough for test.sh, but if the kludges were moved to a .sql > > script which was also run by the buildfarm (in stead of its hardcoded kludges), then > > it might be necessary to handle the additional stuff my patch did, like: > > > > + DROP TRANSFORM FOR integer LANGUAGE sql CASCADE;" > > + DROP FUNCTION boxarea(box);" > > + DROP FUNCTION funny_dup17();" > > These apply for an old version <= v10. > > > + DROP TABLE abstime_tbl;" > > + DROP TABLE reltime_tbl;" > > + DROP TABLE tinterval_tbl;" > > old version <= 9.3. > > > + DROP AGGREGATE first_el_agg_any(anyelement);" > > Not sure about this one. See 97f73a978fc1aca59c6ad765548ce0096d95a923 > These are on 9.4. It is worth noting that TestUpgradeXversion.pm > recreates those objects. I'd agree to close the gap completely rather > than just moving what test.sh does to wipe out a maximum client code > for the buildfarm. >>Or, maybe it's guaranteed that the animals all run latest version of old >>branches, in which case I think some of the BF's existing logic could be >>dropped, which would help to reconcile these two scripts: >> >> my $missing_funcs = q{drop function if exists public.boxarea(box); >> drop function if exists public.funny_dup17(); >>.. >> $prstmt = join(';', >> 'drop operator @#@ (NONE, bigint)', >>.. >> 'drop aggregate if exists public.array_cat_accum(anyarray)', >> I'm not sure if everything the buildfarm does is needed anymore, or if any of it could be removed now, rather than being implemented in test.sh. boxarea, funny_dup - see also db3af9feb19f39827e916145f88fa5eca3130cb2 https://github.com/PGBuildFarm/client-code/commit/9ca42ac1783a8cf99c73b4f7c52bd05a6024669d array_larger_accum/array_cat_accum - see also 97f73a978fc1aca59c6ad765548ce0096d95a923 https://github.com/PGBuildFarm/client-code/commit/a55c89869f30db894ab823df472e739cee2e8c91 @#@ 76f412ab310554acb970a0b73c8d1f37f35548c6 ?? https://github.com/PGBuildFarm/client-code/commit/b3fdb743d89dc91fcea47bd9651776c503f774ff https://github.com/PGBuildFarm/client-code/commit/b44e9390e2d8d904ff8cabd906a2d4b5c8bd300a https://github.com/PGBuildFarm/client-code/commit/3844503c8fde134f7cc29b3fb147d590b6d2fcc1 abstime: https://github.com/PGBuildFarm/client-code/commit/f027d991d197036028ffa9070f4c9193076ed5ed putenv https://github.com/PGBuildFarm/client-code/commit/fa86d0b1bc7a8d7b9f15b1da8b8e43f4d3a08e2b -
Re: pg_upgrade test for binary compatibility of core data types
Tom Lane <tgl@sss.pgh.pa.us> — 2021-11-18T04:57:51Z
Michael Paquier <michael@paquier.xyz> writes: > Okay. I have worked on 0001 to add the table to check after the > binary compatibilities and applied it. Something funny about that on prion: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-11-18%2001%3A55%3A38 @@ -747,6 +747,8 @@ '{(2020-01-02 03:04:05, 2021-02-03 06:07:08)}'::tstzmultirange, arrayrange(ARRAY[1,2], ARRAY[2,1]), arraymultirange(arrayrange(ARRAY[1,2], ARRAY[2,1])); +ERROR: unrecognized key word: "ec2" +HINT: ACL key word must be "group" or "user". -- Sanity check on the previous table, checking that all core types are -- included in this table. SELECT oid, typname, typtype, typelem, typarray, typarray Not sure what's going on there. regards, tom lane -
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-11-18T05:49:35Z
On Wed, Nov 17, 2021 at 11:57:51PM -0500, Tom Lane wrote: > Something funny about that on prion: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-11-18%2001%3A55%3A38 > Not sure what's going on there. Yes, that was just some missing quoting in the aclitem of this new table. prion uses a specific user name, "ec2-user", that caused the failure. -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-11-18T06:58:18Z
On Wed, Nov 17, 2021 at 10:47:28PM -0600, Justin Pryzby wrote: > I'm not sure if everything the buildfarm does is needed anymore, or if any of > it could be removed now, rather than being implemented in test.sh. +-- This file has a bunch of kludges needed for testing upgrades across major versions +-- It supports testing the most recent version of an old release (not any arbitrary minor version). This could be better-worded. Here is an idea: -- -- SQL queries for major upgrade tests -- -- This file includes a set of SQL queries to make a cluster to-be-upgraded -- compatible with the version this file is on. This requires psql, -- as per-version queries are controlled with a set of \if clauses. +\if :oldpgversion_le84 +DROP FUNCTION public.myfunc(integer); +\endif We could retire this part for <= 8.4. The oldest version tested by the buildfarm is 9.2. + psql -X -d regression -f "test-upgrade.sql" || psql_fix_sql_status=$? Shouldn't we use an absolute path here? I was testing a VPATH build and that was not working properly. +-- commit 9e38c2bb5 and 97f73a978 +-- DROP AGGREGATE array_larger_accum(anyarray); +DROP AGGREGATE array_cat_accum(anyarray); + +-- commit 76f412ab3 +-- DROP OPERATOR @#@(bigint,NONE); +DROP OPERATOR @#@(NONE,bigint); +\endif The buildfarm does "CREATE OPERATOR @#@" and "CREATE AGGREGATE array_larger_accum" when dealing with an old version between 9.5 and 13. Shouldn't we do the same and create those objects rather than a plain DROP? What you are doing is not wrong, and it should allow upgrades to work, but that's a bit inconsistent with the buildfarm in terms of coverage. + ver >= 905 AND ver <= 1300 AS oldpgversion_95_13, + ver >= 906 AND ver <= 1300 AS oldpgversion_96_13, + ver >= 906 AND ver <= 1000 AS oldpgversion_96_10, So here, we have the choice between conditions that play with version ranges or we could make those checks simpler but compensate with a set of IF EXISTS queries. I think that your choice is right. The buildfarm mixes both styles to compensate with the cases where the objects are created after a drop. The list of objects and the version ranges look correct to me. -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-12-01T07:19:44Z
On Thu, Nov 18, 2021 at 03:58:18PM +0900, Michael Paquier wrote: > + ver >= 905 AND ver <= 1300 AS oldpgversion_95_13, > + ver >= 906 AND ver <= 1300 AS oldpgversion_96_13, > + ver >= 906 AND ver <= 1000 AS oldpgversion_96_10, > So here, we have the choice between conditions that play with version > ranges or we could make those checks simpler but compensate with a set > of IF EXISTS queries. I think that your choice is right. The > buildfarm mixes both styles to compensate with the cases where the > objects are created after a drop. So, I have come back to this part of the patch set, that moves the SQL queries doing the pre-upgrade cleanups in the old version we upgrade from, and decided to go with what looks like the simplest approach, relying on some IFEs depending on the object types if they don't exist for some cases. While checking the whole thing, I have noticed that some of the operations were not really necessary. The result is rather clean now, with a linear organization of the version logic, so as it is a no-brainer to get that done in back-branches per the backward-compatibility argument. I'll get that done down to 10 to maximize its influence, then I'll move on with the buildfarm code and send a patch to plug this and reduce the dependencies between core and the buildfarm code. -- Michael
-
Re: pg_upgrade test for binary compatibility of core data types
Michael Paquier <michael@paquier.xyz> — 2021-12-02T01:49:22Z
On Wed, Dec 01, 2021 at 04:19:44PM +0900, Michael Paquier wrote: > I'll get that done down to 10 to maximize its influence, then I'll > move on with the buildfarm code and send a patch to plug this and > reduce the dependencies between core and the buildfarm code. Okay, I have checked this one this morning, and applied the split down to 10, so as we have a way to fix objects from the main regression test suite. The buildfarm client gets a bit cleaned up after that (I have a patch for that, but I am not 100% sure that it is right). Still, the global picture is larger than that because there is still nothing done for contrib/ modules included in cross-version checks of pg_upgrade by the buildfarm. The core code tests don't do this much, but if we were to do the same things as the buildfarm, then we would need to run installcheck-world (roughly) on a deployed instance, then pg_upgrade it. That's not going to be cheap, for sure. One thing that we could do is to use unique names for the databases of the contrib/ modules when running an installcheck, so as these are preserved for upgrades (the buildfarm client does that). This has as effect to increase the number of databases for an instance installcheck'ed, so this had better be optional, at least. -- Michael