Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

Mithun Cy <mithun.cy@gmail.com>

From: Mithun Cy <mithun.cy@gmail.com>
To: Hans Buschmann <buschmann@nidsa.net>
Cc: Mithun Cy <mithun.cy@enterprisedb.com>, thomas.munro@gmail.com, pgsql-bugs@lists.postgresql.org, Robert Haas <robertmhaas@gmail.com>
Date: 2019-02-24T18:40:49Z
Lists: pgsql-bugs, pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Don't auto-restart per-database autoprewarm workers.

  2. Fix race in dsm_attach() when handles are reused.

Thanks Hans, for a simple reproducible tests.

On Sun, Feb 24, 2019 at 6:54 PM Hans Buschmann <buschmann@nidsa.net> wrote:
> Here is the start of  the error log:
>
> CPS PRD 2019-02-24 12:11:57 CET  00000  1:> LOG:  database system was
interrupted; last known up at 2019-02-17 16:14:05 CET
> CPS PRD 2019-02-24 12:12:16 CET  00000  2:> LOG:  entering standby mode
> CPS PRD 2019-02-24 12:12:16 CET  00000  3:> LOG:  redo starts at
0/23000028
> CPS PRD 2019-02-24 12:12:16 CET  00000  4:> LOG:  consistent recovery
state reached at 0/23000168
> CPS PRD 2019-02-24 12:12:16 CET  00000  5:> LOG:  invalid record length
at 0/24000060: wanted 24, got 0
> CPS PRD 2019-02-24 12:12:16 CET  00000  9:> LOG:  database system is
ready to accept read only connections
> CPS PRD 2019-02-24 12:12:16 CET  3D000  1:> FATAL:  database 16384 does
not exist
> CPS PRD 2019-02-24 12:12:16 CET  00000 10:> LOG:  background worker
"autoprewarm worker" (PID 3968) exited with exit code 1
> CPS PRD 2019-02-24 12:12:16 CET  00000  1:> LOG:  autoprewarm
successfully prewarmed 0 of 12402 previously-loaded blocks
> CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to
the primary server: FATAL:  no pg_hba.conf entry for replication connection
from host "192.168.27.155", user "replicator", SSL off
> CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic
shared memory segment

As per the log Auto prewarm master did exit ("autoprewarm successfully
prewarmed 0 of 12402 previously-loaded blocks") first. Then only we started
getting "could not map dynamic shared memory segment".
That is, master has done dsm_detach and then workers started throwing error
after that.

> This seems easy to reproduce:
>
> - Install/create a database with autoprewarm on and pg_prewarm loaded.
> - Fill the autoprewarm cache with some data
> - pg_dump the database
> - drop the database
> - create the database and pg_restore it from the dump
> - start the instance and logs are flooded
>
> I have taken no further investigation in the sourcecode due to limited
skills so far...

I was able to reproduce same.

The  "worker.bgw_restart_time" is never set for autoprewarm workers so on
error it get restarted after some period of time (default behavior). Since
database itself is dropped our attempt to connect to that database failed
and then worker exited. But again got restated by postmaster then we start
seeing above DSM segment error.

I think every autoprewarm worker should be set with
"worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be
repeated prewarm attempt of a dropped database. I will try to think further
and submit a patch for same.

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com