AW: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

Hans Buschmann <buschmann@nidsa.net>

From: Hans Buschmann <buschmann@nidsa.net>
To: Tomas Vondra <tomas.vondra@2ndquadrant.com>, Tom Lane <tgl@sss.pgh.pa.us>
Cc: "pgsql-bugs@lists.postgresql.org" <pgsql-bugs@lists.postgresql.org>
Date: 2019-10-09T14:50:52Z
Lists: pgsql-bugs, pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Move into separate file all the SQL queries used in pg_upgrade tests

  2. Add table to regression tests for binary-compatibility checks in pg_upgrade

  3. Fix tests of pg_upgrade across different major versions

  4. Multirange datatypes

  5. Work around cross-version-upgrade issues created by commit 9e38c2bb5.

  6. Declare assorted array functions using anycompatible not anyelement.

  7. Remove factorial operators, leaving only the factorial() function.

  8. Create by default sql/ and expected/ for output directory in pg_regress

  9. Add missing include to pg_upgrade/version.c

  10. Improve the check for pg_catalog.line data type in pg_upgrade

  11. Improve the check for pg_catalog.unknown data type in pg_upgrade

  12. Check for tables with sql_identifier during pg_upgrade

  13. pg_upgrade: clarify the database names in error files

  14. In the pg_upgrade test suite, don't write to src/test/regress.

  15. Allow group access on PGDATA

  16. Refactor dir/file permissions

  17. Remove unused functions in regress.c.

  18. Make WAL segment size configurable at initdb time.

  19. Fix bit-rot in pg_upgrade's test.sh, and improve documentation.

Hi Tomas,

Nice that you could reproduce it.
This was just the way I followed.

For your Info, here are my no-standard config params:

                name                |         current_setting         
------------------------------------+---------------------------------
 application_name                   | psql                            
 auto_explain.log_analyze           | on                              
 auto_explain.log_min_duration      | 0                               
 auto_explain.log_nested_statements | on                              
 client_encoding                    | WIN1252                         
 cluster_name                       | HB_DEV                          
 data_checksums                     | on                              
 DateStyle                          | ISO, DMY                        
 default_text_search_config         | pg_catalog.german               
 dynamic_shared_memory_type         | windows                         
 effective_cache_size               | 8GB                             
 lc_collate                         | C                               
 lc_ctype                           | German_Germany.1252             
 lc_messages                        | C                               
 lc_monetary                        | German_Germany.1252             
 lc_numeric                         | German_Germany.1252             
 lc_time                            | German_Germany.1252             
 log_destination                    | stderr                          
 log_directory                      | N:/ZZ_log/pg_log_hbdev          
 log_error_verbosity                | verbose                         
 log_file_mode                      | 0640                            
 log_line_prefix                    | WHB %a %t %i %e %2l:>           
 log_statement                      | mod                             
 log_temp_files                     | 0                               
 log_timezone                       | CET                             
 logging_collector                  | on                              
 maintenance_work_mem               | 128MB                           
 max_connections                    | 100                             
 max_stack_depth                    | 2MB                             
 max_wal_size                       | 1GB                             
 min_wal_size                       | 80MB                            
 pg_stat_statements.max             | 5000                            
 pg_stat_statements.track           | all                             
 random_page_cost                   | 1                               
 search_path                        | public, archiv, ablage, admin   
 server_encoding                    | UTF8                            
 server_version                     | 12.0                            
 shared_buffers                     | 1GB                             
 shared_preload_libraries           | auto_explain,pg_stat_statements 
 temp_buffers                       | 32MB                            
 TimeZone                           | CET                             
 transaction_deferrable             | off                             
 transaction_isolation              | read committed                  
 transaction_read_only              | off                             
 update_process_title               | off                             
 wal_buffers                        | 16MB                            
 wal_segment_size                   | 16MB                            
 work_mem                           | 32MB                            
(48 rows)

Indeed, the database has UTF8 Encoding.

The Extended error-log (i have set auto_explain):



WHB psql 2019-10-09 15:45:03 CEST  XX000  7:> ERROR:  XX000: invalid memory alloc request size 18446744073709551613
WHB psql 2019-10-09 15:45:03 CEST  XX000  8:> LOCATION:  palloc, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\utils\mmgr\mcxt.c:934
WHB psql 2019-10-09 15:45:03 CEST  XX000  9:> STATEMENT:  select * from q_tbl_archiv;
WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  1:> LOG:  00000: duration: 0.022 ms  plan:
	Query Text: SELECT pg_catalog.set_config('search_path', '', false);
	Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1)
WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  2:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  3:> LOG:  00000: duration: 0.072 ms  plan:
	Query Text: SELECT datname FROM pg_database WHERE datallowconn ORDER BY 1;
	Sort  (cost=1.16..1.16 rows=1 width=64) (actual time=0.063..0.064 rows=14 loops=1)
	  Sort Key: datname
	  Sort Method: quicksort  Memory: 26kB
	  ->  Seq Scan on pg_database  (cost=0.00..1.15 rows=1 width=64) (actual time=0.018..0.022 rows=14 loops=1)
	        Filter: datallowconn
	        Rows Removed by Filter: 1
WHB vacuumdb 2019-10-09 15:46:42 CEST  00000  4:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  1:> LOG:  00000: duration: 0.027 ms  plan:
	Query Text: SELECT pg_catalog.set_config('search_path', '', false);
	Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.012..0.013 rows=1 loops=1)
WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  2:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  3:> LOG:  00000: duration: 1.036 ms  plan:
	Query Text: SELECT c.relname, ns.nspname FROM pg_catalog.pg_class c
	 JOIN pg_catalog.pg_namespace ns ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid
	 LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid
	 WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
	 ORDER BY c.relpages DESC;
	Sort  (cost=56.56..56.59 rows=13 width=132) (actual time=0.843..0.854 rows=320 loops=1)
	  Sort Key: c.relpages DESC
	  Sort Method: quicksort  Memory: 110kB
	  ->  Hash Join  (cost=1.23..56.32 rows=13 width=132) (actual time=0.082..0.649 rows=320 loops=1)
	        Hash Cond: (c.relnamespace = ns.oid)
	        ->  Seq Scan on pg_class c  (cost=0.00..55.05 rows=13 width=76) (actual time=0.034..0.545 rows=320 loops=1)
	              Filter: ((relkind)::text = ANY ('{r,m}'::text[]))
	              Rows Removed by Filter: 950
	        ->  Hash  (cost=1.10..1.10 rows=10 width=68) (actual time=0.022..0.022 rows=10 loops=1)
	              Buckets: 1024  Batches: 1  Memory Usage: 9kB
	              ->  Seq Scan on pg_namespace ns  (cost=0.00..1.10 rows=10 width=68) (actual time=0.010..0.011 rows=10 loops=1)
WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  4:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  5:> LOG:  00000: duration: 0.011 ms  plan:
	Query Text: SELECT pg_catalog.set_config('search_path', '', false);
	Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.008..0.008 rows=1 loops=1)
WHB vacuumdb 2019-10-09 15:46:43 CEST  00000  6:> LOCATION:  explain_ExecutorEnd, d:\pginstaller_12.auto\postgres.windows-x64\contrib\auto_explain\auto_explain.c:415
WHB  2019-10-09 15:47:01 CEST  00000 22:> LOG:  00000: server process (PID 4708) was terminated by exception 0xC0000005
WHB  2019-10-09 15:47:01 CEST  00000 23:> DETAIL:  Failed process was running: ANALYZE admin.q_tbl_archiv;
WHB  2019-10-09 15:47:01 CEST  00000 24:> HINT:  See C include file "ntstatus.h" for a description of the hexadecimal value.
WHB  2019-10-09 15:47:01 CEST  00000 25:> LOCATION:  LogChildExit, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3670
WHB  2019-10-09 15:47:01 CEST  00000 26:> LOG:  00000: terminating any other active server processes
WHB  2019-10-09 15:47:01 CEST  00000 27:> LOCATION:  HandleChildCrash, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3400
WHB psql 2019-10-09 15:47:01 CEST  57P02 10:> WARNING:  57P02: terminating connection because of crash of another server process
WHB psql 2019-10-09 15:47:01 CEST  57P02 11:> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
WHB psql 2019-10-09 15:47:01 CEST  57P02 12:> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WHB psql 2019-10-09 15:47:01 CEST  57P02 13:> LOCATION:  quickdie, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\tcop\postgres.c:2717
WHB  2019-10-09 15:47:02 CEST  57P02  3:> WARNING:  57P02: terminating connection because of crash of another server process
WHB  2019-10-09 15:47:02 CEST  57P02  4:> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
WHB  2019-10-09 15:47:02 CEST  57P02  5:> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WHB  2019-10-09 15:47:02 CEST  57P02  6:> LOCATION:  quickdie, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\tcop\postgres.c:2717
WHB  2019-10-09 15:47:02 CEST  00000 28:> LOG:  00000: all server processes terminated; reinitializing
WHB  2019-10-09 15:47:02 CEST  00000 29:> LOCATION:  PostmasterStateMachine, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\postmaster\postmaster.c:3912
WHB  2019-10-09 15:47:02 CEST  00000  1:> LOG:  00000: database system was interrupted; last known up at 2019-10-09 15:46:03 CEST
WHB  2019-10-09 15:47:02 CEST  00000  2:> LOCATION:  StartupXLOG, d:\pginstaller_12.auto\postgres.windows-x64\src\backend\access\transam\xlog.c:6277


The table was imported successively by pg_dump/pg_restore from the previous versions into pg11.

This was the same what I did on the other machine (pg 11.5). On this test machine I could successfully Export the table with pg_dump -t.

On the erroneous PG12 Cluster I succeeded to recreate a similar  table with the original create table Statements: no Errors.

Under PG12 upgraded, I tried to select only the first column (select table_name from q_tbl_archiv) and got erroneaus results (shown first 2 entries):

cpsdb=# select table_name from q_tbl_archiv;
                 table_name
---------------------------------------------
 \x11chemmat\x17chm_season
 !collectionsheet\x15cs_season

It seems that the length Bytes are present in the Output.

Hope this Information helps.

Hans Buschmann