Re: BUG #16045: vacuum_db crash and illegal memory alloc after pg_upgrade from PG11 to PG12

Tomas Vondra <tomas.vondra@2ndquadrant.com>

From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
To: buschmann@nidsa.net, pgsql-bugs@lists.postgresql.org
Date: 2019-10-09T13:59:07Z
Lists: pgsql-bugs, pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Move into separate file all the SQL queries used in pg_upgrade tests

  2. Add table to regression tests for binary-compatibility checks in pg_upgrade

  3. Fix tests of pg_upgrade across different major versions

  4. Multirange datatypes

  5. Work around cross-version-upgrade issues created by commit 9e38c2bb5.

  6. Declare assorted array functions using anycompatible not anyelement.

  7. Remove factorial operators, leaving only the factorial() function.

  8. Create by default sql/ and expected/ for output directory in pg_regress

  9. Add missing include to pg_upgrade/version.c

  10. Improve the check for pg_catalog.line data type in pg_upgrade

  11. Improve the check for pg_catalog.unknown data type in pg_upgrade

  12. Check for tables with sql_identifier during pg_upgrade

  13. pg_upgrade: clarify the database names in error files

  14. In the pg_upgrade test suite, don't write to src/test/regress.

  15. Allow group access on PGDATA

  16. Refactor dir/file permissions

  17. Remove unused functions in regress.c.

  18. Make WAL segment size configurable at initdb time.

  19. Fix bit-rot in pg_upgrade's test.sh, and improve documentation.

FWIW I can reproduce this - it's enough to do this on the 11 cluster

create table q_tbl_archiv as
with
qseason as (
select table_name,column_name, ordinal_position
,replace(column_name,'_season','') as col_qualifier
-- ,'id_'||replace(column_name,'_season','') as id_column
from information_schema.columns
order by table_name
)
select qs.*,c.column_name as id_column, c.column_default as id_default
from
        qseason qs
        left join information_schema.columns c on c.table_name=qs.table_name and
c.column_name like 'id_%';


and then

    analyze q_tbl_archiv

which produces backtrace like this:

No symbol "stats" in current context.
(gdb) bt
#0  0x0000746095262951 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1  0x0000000000890a8e in varstrfastcmp_locale (a1p=0x17716b4 "per_language\a", len1=<optimized out>, a2p=0x176af28 '\177' <repeats 136 times>, "\021\004", len2=-4, ssup=<optimized out>, ssup=<optimized out>) at varlena.c:2320
#2  0x0000000000890cb1 in varlenafastcmp_locale (x=24581808, y=24555300, ssup=0x7ffc649463f0) at varlena.c:2219
#3  0x00000000005b73b4 in ApplySortComparator (ssup=0x7ffc649463f0, isNull2=false, datum2=<optimized out>, isNull1=false, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224
#4  compare_scalars (a=<optimized out>, b=<optimized out>, arg=0x7ffc649463e0) at analyze.c:2700
#5  0x00000000008f9953 in qsort_arg (a=a@entry=0x178fdc0, n=<optimized out>, n@entry=2158, es=es@entry=16, cmp=cmp@entry=0x5b7390 <compare_scalars>, arg=arg@entry=0x7ffc649463e0) at qsort_arg.c:140
#6  0x00000000005b86a6 in compute_scalar_stats (stats=0x176a208, fetchfunc=<optimized out>, samplerows=<optimized out>, totalrows=2158) at analyze.c:2273
#7  0x00000000005b9d95 in do_analyze_rel (onerel=onerel@entry=0x74608c00d3e8, params=params@entry=0x7ffc64946970, va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>, relpages=22, inh=inh@entry=false, in_outer_xact=false, elevel=13)
    at analyze.c:529
#8  0x00000000005bb2c9 in analyze_rel (relid=<optimized out>, relation=<optimized out>, params=params@entry=0x7ffc64946970, va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:260
#9  0x000000000062c7b0 in vacuum (relations=0x1727120, params=params@entry=0x7ffc64946970, bstrategy=<optimized out>, bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=true) at vacuum.c:413
#10 0x000000000062cd49 in ExecVacuum (pstate=pstate@entry=0x16c9518, vacstmt=vacstmt@entry=0x16a82b8, isTopLevel=isTopLevel@entry=true) at vacuum.c:199
#11 0x00000000007a6d64 in standard_ProcessUtility (pstmt=0x16a8618, queryString=0x16a77a8 "", context=<optimized out>, params=0x0, queryEnv=0x0, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at utility.c:670
#12 0x00000000007a4006 in PortalRunUtility (portal=0x170f368, pstmt=0x16a8618, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x16a8710, completionTag=0x7ffc64946cb0 "") at pquery.c:1175
#13 0x00000000007a4b61 in PortalRunMulti (portal=portal@entry=0x170f368, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710,
    completionTag=completionTag@entry=0x7ffc64946cb0 "") at pquery.c:1321
#14 0x00000000007a5864 in PortalRun (portal=portal@entry=0x170f368, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x16a8710, altdest=altdest@entry=0x16a8710,
    completionTag=0x7ffc64946cb0 "") at pquery.c:796
#15 0x00000000007a174e in exec_simple_query (query_string=0x16a77a8 "") at postgres.c:1215

Looking at compute_scalar_stats, the "stats" parameter does not seem
particularly healthy:

(gdb) p *stats
$3 = {attr = 0x10, attrtypid = 12, attrtypmod = 0, attrtype = 0x1762e00, attrcollid = 356, anl_context = 0x7f7f7f7e00000000, compute_stats = 0x100, minrows = 144, extra_data = 0x1762e00, stats_valid = false, stanullfrac = 0,
  stawidth = 0, stadistinct = 0, stakind = {0, 0, 0, 0, 0}, staop = {0, 0, 0, 0, 0}, stacoll = {0, 0, 0, 0, 0}, numnumbers = {0, 0, 0, 0, 0}, stanumbers = {0x0, 0x0, 0x0, 0x0, 0x0}, numvalues = {0, 0, 0, 0, 2139062142}, stavalues = {
    0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f, 0x7f7f7f7f7f7f7f7f}, statypid = {2139062143, 2139062143, 2139062143, 2139062143, 2139062143}, statyplen = {32639, 32639, 32639, 32639, 32639},
  statypbyval = {127, 127, 127, 127, 127}, statypalign = "\177\177\177\177\177", tupattnum = 2139062143, rows = 0x7f7f7f7f7f7f7f7f, tupDesc = 0x7f7f7f7f7f7f7f7f, exprvals = 0x8, exprnulls = 0x4, rowstride = 24522240}

Not sure about the root cause yet.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services